Opensearch cluster can't allocate after deleting index - amazon-web-services

Problem:
After deleting index all indices are in the red state and cluster allocation is stuck jumping between two states.
Preliminaries:
Split operation from index x (280gb 10mil documents 1shard) to index y that was expected to have 8shards. After execution split couldn't perform delete-by-query and index was jumping from 700gb to 1.3tb for a day, I made decision to delete that index and after that all indices are stuck in the red state.
Question:
What caused it? How it can be fixed?
Cluster details
Cluster managed with helm+kubernetes on aws (I can share instances details if needed)
3 data, 3 master nodes
1200gb disk attached to each data-node
32gb ram data node, 8 gb master-node

After deeper investigation, I realised that my data nodes probably stopped working correctly and I had to delete pods to force restart on them, after that cluster started operating normally

Related

`kops update cluster` returns multiple will create/modify resources

I have a Kubernetes cluster that uses 1.17.17. I want to increase the CPU/RAM of a node using KOPS. When running kops update cluster command, I expect it would return the preview of my old instance type VS new instance type.
However, it returns a long line of will create resources/will modify resources.
I want to know why it shows a long log of changes it will execute instead of showing only the changes I made for instance type. Also, if this is safe to apply the changes.
After you will do that cluster update you are going to do rolling update on that cluster. The nodes will be terminated one by one and the new ones are going to show. Also while one node is going down to be replaced with the new one the services inside that node are going to be shifted on that one . Small tip remove all poddistributionbudgets. Also the log is fine dont worry.

Best way to retire an index

I am retiring an old elastic search index in AWS that has not received a new document since 2016. However, something is still trying to search it.
I still want deprecate this index in a manner manner where I can get back to the original state quickly. I have created a manual snapshot of the index and it is sitting in S3. I was planning on deleting the domain, but, from what I understand, that deletes everything billable under AWS including the end point. As I mentioned above, I want to be able to get back to the original state of the index. So this domain contains a series of indexes. The largest index is 20.5 Gb. I was going to delete the large index and resize the cluster to a smaller instance size and footprint. Will this work or will it be unsearchable?
I've no experience using Elasticsearch on AWS, but I have an idea about your index.
You say the index has received no new documents for a long time. If this also means no deletions and no updates, you could theoretically just take this index to a new cluster, using either snapshot + restore, or a cross-cluster reindex. Continue operating your old cluster until you're sure the new one is working well.
Again - not familiar with AWS terminology, but it sounds like this approach translates to using separate "domains". First you fully ensure the new "domain" is working with the right hardware spec and data, and then delete the old "domain".
TL;DR -> yes!
The backup to S3 will work, but the documents will be unsearchable because in order to downsize the storage you have to delete the index.
But if someday you want to restore the data from S3 back to the index, you can.
You can resize instances and storage sizes with no downtime, however, that takes a long time and you pay extra for the machines while they are resizing.
Example:
you change your storage size from 100gb to 99gb
elasticsearch service will spin up another instance, copy all your data from the old instance to the new one and then delete the old one.
same for instance sizes.
machine up, cluster sync, machine down.
while they are syncing, you pay for them.
your plan will work, es is very flexible.
if you really don't trust aws, just make a json export from the index and keep it on s3 too, just in case things go south.

AWS elasticsearch log rotation

I want to use AWS elasticsearch to store the log of my application. Since there a huge amount of data to input to AWS elasticsearch ( ~30GB daily), so i would only keep 3 days of data. Are there any way to schedule data removal from AWS elasticsearch or do a log rotation? What happen if the AWS elasticsearch storage is full?
Thanks for the help
A possible way is to specify the index parameter in elasticsearchoutput to something like logstash-%{appname}-%{date_format}". Hence you can then use curator plugin in order to delete the old indices by number of days or so.
This SO pretty much explains the same. Hope it helps!
I assume you are using the AWS Amazon Elasticsearch Service?
The storage type is an EBS volume with a fixed size of disk space. If you want to keep only the last three days, I assume you have 3 indices then, like that
my-index-2017.01.30
my-index-2017.01.31
my-index-2017.02.01
Basically you can write some simple script which deletes indices older than 3 days. With the REST API it just is in Sense DELETE my-index-2017.01.30.
I recommend to use Elasticsearch Curator for the job. See https://www.elastic.co/guide/en/elasticsearch/client/curator/current/delete_indices.html
I'm not sure if the Service interface itself has an option for that. But Elasticsearch Curator should do the job for you.
Update for 2020:
AWS ES has now support for Index state management which lets you define custom management policies to automate routine tasks and apply them to indices and index patterns. You no longer need to set up and manage external processes to run your index operations.
For example, you can define a policy that moves your index into a read_only state after 30 days and then ultimately deletes it after 90 days.
Index State Management - https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/ism.html

AWS - Aurora replicas

Scenario:
I have two reader-aurora replicas.
I make many calls to my system (high load)
I see only one replica working at 99.30%, but the other one is not doing
anything at all
Why?, is because this second replica is ONLY to prevent failures of the first one?, cannot be possible to make both to share the load?
In your RDS console, you should be able to look at each of the 3 instances
aurora-databasecluster-xxx.cluster-yyy.us-east-1.rds.amazonaws.com:3306
zz0.yyy.us-east-1.rds.amazonaws.com:3306
zz1.yyy.us-east-1.rds.amazonaws.com:3306
If you look at the cluster tab you will see two end points and the 2nd is the following:
aurora-databasecluster-xxx.cluster-ro-yyy.us-east-1.rds.amazonaws.com
Aurora allows you do either explicitly get to specific read replica. This would allow a set of read only nodes for OLTP performance and another set for data analysis - with long running queries that won't impact performance.
If you use the -ro end point, it should balance cross all read only nodes or you can have your code take a list of read only connection strings and do your own randomizer. I would have expected the ro to be better...but I am not yet familiar on their load balancing technique (fewest connections, round robin, etc)

DynamoDB cross-regional table copy only copies partial data

I've tried default setup of Data Pipeline for cross-regional table copy. Copying one table to another in same region (eu-west-1).
On pipeline activation, EMR cluster is launched, runs for approx 20 minutes and then it's terminated with pipeline being in "success" state.
Problem is that only 389 entries are copied from my table :/ (number is the same on multiple runs). Total number of entries is close to 100000.
I've tried turning on logs (no errors there), walking through them, launching 4.1.0 cluster, increasing throughputs, etc, nothing solves the case.
Does cross-regional table copy work? What could be the problem? Why no error? How do I debug it?
Datapipeline config: https://gist.github.com/mariusgrigaitis/adceb18354b52d845278