Elasticsearch on AWS: How to fix unassigned shards? - amazon-web-services

I have an index on AWS Elasticsearch which were unassighed due to NODE_LEFT. Here's an output of _cat/shards
rawindex-2017.07.04 1 p STARTED
rawindex-2017.07.04 3 p UNASSIGNED NODE_LEFT
rawindex-2017.07.04 2 p STARTED
rawindex-2017.07.04 4 p STARTED
rawindex-2017.07.04 0 p STARTED
under normal circumstances, it would be easy to reassign these shards by using the _cluster or _settings. However, these are the exact APIs that are not allowed by AWS. I get the following message:
{
Message: "Your request: '/_settings' is not allowed."
}
According to an answer to a very similar question, I can change the setting of an index using _index API, which is allowed by AWS. However, it seems like index.routing.allocation.disable_allocation is not valid for Elasticsearch 5.x, which I am running. I get the following error:
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[enweggf][x.x.x.x:9300][indices:admin/settings/update]"
}
],
"type": "illegal_argument_exception",
"reason": "unknown setting [index.routing.allocation.disable_allocation] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"
},
"status": 400
}
I tried prioritizing index recovery with high index.priority as well as setting index.unassigned.node_left.delayed_timeout to 1 minute, but I am just not being able to reassign them.
Is there any way (dirty or elegant) to achieve this on AWS managed ES?
Thanks!

I had a similar issue with AWS Elasticsearch version 6.3, namely 2 shards failed to be assigned, and the cluster had status RED. Running GET _cluster/allocation/explain showed that the reason was that they had exceeded the default maximum allocation retries of 5.
Running the query GET <my-index-name>/_settings revealed the few settings that can be changed per index. Note that all queries are in Kibana format which you have out of the box if you are using AWS Elasticsearch service. The following solved my problem:
PUT <my-index-name>/_settings
{
"index.allocation.max_retries": 6
}
Running GET _cluster/allocation/explain immediately afterwards returned an error with the following: "reason": "unable to find any unassigned shards to explain...", and after some time the problem was resolved.

There might be an alternative solution when the other solutions fail. If you have a managed Elasticsearch Instance on AWS the chances are high that you can "just" restore a snapshot.
Check for failed indexes.
You can use for e.g.:
curl -X GET "https://<es-endpoint>/_cat/shards"
or
curl -X GET "https://<es-endpoint>/_cluster/allocation/explain"
Check for snapshots.
To find snapshot repositories execute the following query:
curl -X GET "https://<es-endpoint>/_snapshot?pretty"
Next let's have a look at all the snapshots in the cs-automated repository:
curl -X GET "https://<es-endpoint>/_snapshot/cs-automated/_all?pretty"
Find a snapshot where failures: [ ] is empty or the index you want to restore is NOT in a failed state. Then delete the index you want to restore:
curl -XDELETE 'https://<es-endpoint>/<index-name>'
... and restore the deleted index like this:
curl -XPOST 'https://<es-endpoint>/_snapshot/cs-automated/<snapshot-name>/_restore' -d '{"indices": "<index-name>"}' -H 'Content-Type: application/json'
There is also some good documentation here:
https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-managedomains-snapshots.html#es-managedomains-snapshot-restore
https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/aes-handling-errors.html#aes-handling-errors-red-cluster-status

I also faced a similar problem. The solution is pretty simple. You can solve it in 2 different ways.
First solution is to edit all indexes collectively:
PUT _all/_settings
{
"index.allocation.max_retries": 3
}
Second solution is to edit specific indexes:
PUT <myIndex>/_settings
{
"index.allocation.max_retries": 3
}

Related

AWS ElasticSearch - status red

We're having an issue with ElasticSearch on AWS.
The node is in Red Status for couple of hours now. I have no idea how to recover this.
I have tried a few suggestions:
curl -XGET -u 'username:password' 'host:443/_cluster/allocation/explain'
But all of the requests are coming back with:
{
"error" : {
"root_cause" : [
{
"type" : "master_not_discovered_exception",
"reason" : null
}
],
"type" : "master_not_discovered_exception",
"reason" : null
},
"status" : 503
}
The health dashboard is showing this:
Any ideas on how I can recover the instance?
UPDATE:
It looks like one of the nodes has disappeared:
24 hours ago
Now:
UPDATE:
Maybe there was too much RAM use? How do I fix it? The node is not even listed in the list of nodes. Can I curl a specific node?
UPDATE:
Ended up just re-creating the instance from beginning. Apparently Master nodes is a no-go. You are supposed to have 3 at least as when you have 2 master nodes and one of them crashes, the other one does nothing to restore it.
if you are getting the no masters discovered error then there's something pretty wrong with your deployment
you'd need to contact aws support for this, as that is what is managing the node deployment at the end of the day

How to increase _cluster/settings/cluster.max_shards_per_node for AWS Elasticsearch Service

I uses AWS Elasticsearch service version 7.1 and its built-it Kibana to manage application logs. New indexes are created daily by Logstash. My Logstash gets error about maximum shards limit reach from time to time and I have to delete old indexes for it to become working again.
I found from this document (https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/aes-handling-errors.html) that I have an option to increase _cluster/settings/cluster.max_shards_per_node.
So I have tried that by put following command in Kibana Dev Tools
PUT /_cluster/settings
{
"defaults" : {
"cluster.max_shards_per_node": "2000"
}
}
But I got this error
{
"Message": "Your request: '/_cluster/settings' payload is not allowed."
}
Someone suggests that this error occurs when I try to update some settings that are not allowed by AWS, but this document (https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/aes-supported-es-operations.html#es_version_7_1) tells me that cluster.max_shards_per_node is one in the allowed list.
Please suggest how to update this settings.
You're almost there, you need to rename defaults to persistent
PUT /_cluster/settings
{
"persistent" : {
"cluster.max_shards_per_node": "2000"
}
}
Beware though, that the more shards you allow per node, the more resources each node will need and the worse the performance can get.

AWS ElasticSearch Transition to UltraWarm Fails

I am using AWS ElasticSearch service, and am attempting to use a policy to transition indices to UltraWarm storage. However, each time the migration to UltraWarm begins, Kibana displays the error, "Failed to start warm migration" for the managed index. The complete error message is below. The "cause" is not very helpful. I am looking for help on how to identify / resole the root cause of this issue. Thanks!
{
"cause": "[753f6f14e4f92c962243aec39d5a7c31][10.212.32.199:9300][indices:admin/ultrawarm/migration/warm]",
"message": "Failed to start warm migration"
}

Failed to convert server response to JSON | gcloud.services.operations.describe

I'm new to google cloud services. I was going through some tutorials and I had to run the following command in order to describe an operation.
$ gcloud services operations describe operations/acf.xxxx
however, this command has failed with the error stating:
ERROR: (gcloud.services.operations.describe) INTERNAL: Failed to convert server response to JSON
I'm performing these operations in windows PowerShell using bash commands. Is there any solution to resolve this?
Possibly a bug.
I get the same error on Linux with gcloud v.291.0.0.
You may wish to report this issue at Google's issuetracker
A useful feature of gcloud is that you can append any command with --log-http to see the underlying REST API calls which is often (not really in this case) more illuminating of the error.
This yields (for me):
uri: https://serviceconsumermanagement.googleapis.com/v1beta1/operations/${OPERATION_ID}?alt=json
method: GET
...
{
"error": {
"code": 500,
"message": "Failed to convert server response to JSON",
"status": "INTERNAL"
}
}
Another excellent (debugging) tool is APIs Explorer that supports all Google's REST endpoints. This is accessible from API documentation:
https://cloud.google.com/service-infrastructure/docs/service-consumer-management/reference/rest/v1beta1/operations/get
If you complete the APIs Explorer (form) on the righthand side, I suspect, you'll receive the same error.
The approaches appear to confirm that the issue is Google-side.

AWS elastic-search. FORBIDDEN/8/index write (api). Unable to write to index

I am trying dump a list of docs to an AWS elastic-search instance. It was running fine. Then, all of sudden it started throwing this error:
{ _index: '<my index name>',
_type: 'type',
_id: 'record id',
status: 403,
error:
{ type: 'cluster_block_exception',
reason: 'blocked by: [FORBIDDEN/8/index write (api)];' } }
I checked in forums. Most of them says that it is a JVM memory issue. If it is going more than 92%, AWS will stop any writes to the cluster/index. However, when I checked the JVM memory, it shows less than 92%. I am missing something here?
This error is the Amazon ES service actively blocking writes to protect the cluster from reaching red or yellow status. It does this using index.blocks.write.
The two reasons being:
Low Memory
When the JVMMemoryPressure metric exceeds 92% for 30 minutes, Amazon ES triggers a protection mechanism and blocks all write operations to prevent the cluster from reaching red status. When the protection is on, write operations fail with a ClusterBlockException error, new indexes can't be created, and the IndexCreateBlockException error is thrown.
When the JVMMemoryPressure metric returns to 88% or lower for five minutes, the protection is disabled, and write operations to the cluster are unblocked.
Low Disk Space
Elasticsearch has a default "low watermark" of 85%, meaning that once disk usage exceeds 85%, Elasticsearch no longer allocates shards to that node. Elasticsearch also has a default "high watermark" of 90%, at which point it attempts to relocate shards to other nodes.
This error indicates that AWS ElasticSearch has placed a block on your domain based upon disk space. At 85%, ES will not allow you create any new indexes. At 90%, no new documents can be written.
ES could apply write block on index during rollovers, or Low disk space or memory.
In order to stop these errors you need to remove the write block on the index by setting index.blocks.write to false
curl -X PUT -H "Content-Type: application/json" \
'http://localhost:9200/{index_name}/_settings' \
-d '{ "index": { "blocks": { "write": "false" } } }'
The accepted solution was not enough in my case, I had to remove index.blocks.read_only_allow_delete as well
PUT /my_index/_settings
{
"index.blocks.read_only_allow_delete": null,
"index.blocks.write": null
}
ES version 7.15
This can also happen if the index you're trying to write to has been marked as read only. I've had it happen due to an Index State Management misconfiguration which caused a weekly index to be moved to a warm state after one day.