Elasticsearch 6.3 (AWS) snapshot restore progress ERROR: "/_recovery is not allowed" - amazon-web-services

I take manual snapshots of an Elasticsearch index
These are stored in a snapshot repo on S3
I have created a new ES cluster, also version 6.3
I have connected the new cluster to the S3 snapshot repo via python script method mentioned in this blog post: https://medium.com/docsapp-product-and-technology/aws-elasticsearch-manual-snapshot-and-restore-on-aws-s3-7e9783cdaecb
I have confirmed that the new cluster has access to the snapshot repo via the GET /_snapshot/manual-snapshot-repo/_all?pretty command
I have initiated a snapshot restore to this new cluster via:
POST /_snapshot/manual-snapshot-repo/snapshot_name/_restore
{
"indices": "reports",
"ignore_unavailable": false,
"include_global_state": false
}
It is clear that this operation has at least partially succeeded as the cluster status has gone from "green" to "yellow" and a GET request to /_cluster/health yields information that suggests actions are occuring on an otherwise empty cluster... not to mention storage is starting to be utilized (when viewing cluster health on AWS).
I would very much like to monitor the progress of the restore operation.
Elasticsearch docs suggest to use the Recovery API. Docs Link: https://www.elastic.co/guide/en/elasticsearch/reference/6.3/indices-recovery.html
It is clear from the docs that GET /_recovery?human or GET /my_index/_recovery?human should yield restore progress.
However, I encounter the following error:
"Message": "Your request: '/_recovery' is not allowed."
I get the same message when attempting the GET command in the following ways:
Via Kibana dev tools
Via chrome address bar (It's just a GET operation after all)
Via Advanced REST Client (a Chrome app)
I have not been able to locate any other mention of this particular error message.
How can I utilize the GET /_recovery?human command on my ElasticSearch 6.3 clusters?
Thank you!

The Amazon managed Elasticsearch does not have all the endpoints available.
For version 6.3 you can check this link for the endpoints available, and _recovery is not on the list, that is why you get that message.
Without the _recovery endpoint you will need to rely on _cluster/health.

Related

AWS Service Quota: How to get service quota for Amazon S3 using boto3

I get the error "An error occurred (NoSuchResourceException) when calling the GetServiceQuota operation:" while trying running the following boto3 python code to get the value of quota for "Buckets"
client_quota = boto3.client('service-quotas')
resp_s3 = client_quota.get_service_quota(ServiceCode='s3', QuotaCode='L-89BABEE8')
In the above code, QuotaCode "L-89BABEE8" is for "Buckets". I presumed the value of ServiceCode for Amazon S3 would be "s3" so I put it there but I guess that is wrong and throwing error. I tried finding the documentation around ServiceCode for S3 but could not find it. I even tried with "S3" (uppercase 'S' here), "Amazon S3" but that didn't work as well.
What I tried?
client_quota = boto3.client('service-quotas') resp_s3 = client_quota.get_service_quota(ServiceCode='s3', QuotaCode='L-89BABEE8')
What I expected?
Output in the below format for S3. Below example is for EC2 which is the output of resp_ec2 = client_quota.get_service_quota(ServiceCode='ec2', QuotaCode='L-6DA43717')
I just played around with this and I'm seeing the same thing you are, empty responses from any service quota list or get command for service s3. However s3 is definitely the correct service code, because you see that come back from the service quota list_services() call. Then I saw there are also list and get commands for AWS default service quotas, and when I tried that it came back with data. I'm not entirely sure, but based on the docs I think any quota that can't be adjusted, and possibly any quota your account hasn't requested an adjustment for, will probably come back with an empty response from get_service_quota() and you'll need to run get_aws_default_service_quota() instead.
So I believe what you need to do is probably run this first:
client_quota.get_service_quota(ServiceCode='s3', QuotaCode='L-89BABEE8')
And if that throws an exception, then run the following:
client_quota.get_aws_default_service_quota(ServiceCode='s3', QuotaCode='L-89BABEE8')

AWS SageMaker Domain Status "Update_Failed" due to custom image appImageConfigName error

I'm having some trouble recovering from failures in attaching custom images to my sagemaker domain.
I first created a custom image according to here.
When I use sagemaker console to attach the image built with sm-docker, it appears to successfully "attach" in the domain's image list, but when inspecting the image in the console, it shows an error:
Value '' at 'appImageConfigName' failed to satisfy constraint: Member
must satisfy regular expression pattern
This occurs even when the repository or tag are comprised of only alphanumeric characters.
After obtaining this error, I deleted the repositories in ECR.
Since then, the domain fails to update and I am unable to launch any apps or attempt to attach additional images.
The first issue I would like to address is restoring functionality of my sagemaker domain so I can further troubleshoot the issue. I am unable to delete the domain because of this error, even when there are no users, apps, or custom images associated with the domain.
The second issue I would like to address is being able troubleshoot the appImageConfigName error.
Thanks!
While I was unable to delete the domain via console, I was able to delete it via cli.

aws kinesis data analytics application (flink) change property originally located at flink-conf.yaml

As the runtime of my flink application I use managed flink by AWS (Kinesis Data Analytics Application)
I added functionality (sink) for write processed events from kinesis queue in S3 in a parquet format.
Locally everything works for me, but when I try to run the application in the cloud I get the following exception:
"throwableInformation": [
"com.esotericsoftware.kryo.KryoException: Error constructing instance of class: org.apache.avro.Schema$LockableArrayList",
"Serialization trace:",
"types (org.apache.avro.Schema$UnionSchema)",
"schema (org.apache.avro.Schema$Field)",
"fieldMap (org.apache.avro.Schema$RecordSchema)",
After finding a solution to the problem, I found that I need to change following property (checked this on a local cluster):
classloader.resolve-order: child-first -> classloader.resolve-order: parent-first
Is it possible to change this configuration when using AWS managed Fink (not EMR, Kinesis Data Analytics applications) in any way?
aws support answer: No. This property cannot be changed.

how to check if gcloud backend service/url map are ready

Is there a way to determine if a backend service is ready? I ask because I run a script that creates a backend then a url map that uses this backend. The problem is I sometimes get errors saying the backend is not ready for use. I need to be able to pause until the backend is ready before I create a url map. I could check the error response for the phrase 'is not ready' but this isn't reliable for future versions of gcloud. This is somewhat related to another post I recently made on how to reliably check for gcloud errors.
I could also say the same for the url map. When i create a proxy that uses the url map, sometimes i get the error saying the url map is not ready.
Here's an example of what I'm experiencing:
gcloud compute url-maps add-path-matcher app-url-map
--path-matcher-name=web-path-matcher
--default-service=web-backend
--new-hosts="example.com"
--path-rules="/*=web-backend"
ERROR: (gcloud.compute.url-maps.add-path-matcher) Could not fetch resource:
- The resource 'projects/my-project/global/backendServices/web-backend' is not ready
gcloud compute target-https-proxies create app-https-proxy
--url-map app-url-map
--ssl-certificates app-ssl-cert
ERROR: (gcloud.compute.target-https-proxies.create) Could not fetch resource:
- The resource 'projects/my-project/global/urlMaps/app-url-map' is not ready
gcloud -v
Google Cloud SDK 225.0.0
beta 2018.11.09
bq 2.0.37
core 2018.11.09
gsutil 4.34
would assume it's gcloud alpha resources list ...
see the Error Messages of the Resource Manager and scroll down to the bottom, there it reads:
notReady The API server is not ready to accept requests.
which equals HTTP 503, SERVICE_UNAVAILABLE.
adding the --verbosity option might provide some more details.
see the documentation.

Can't close ElasticSearch index on AWS?

I've created a new AWS ElasticSearch domain, for testing. I use ES on a different host right now, and I'm looking to move to AWS.
One thing I need to do is set the mapping (analyzers) on my instance. In order to do this, I need to "close" the index, or else ES will just raise an exception.
Whenever I try to close the index, though, I get an exception from AWS:
Your request: '/_all/_close' is not allowed by CloudSearch.
The AWS ES documentation specifically says to do this in some cases:
curl -XPOST 'http://search-weblogs-abcdefghijklmnojiu.us-east-1.a9.com/_all/_close'
I haven't found any documentation that says why I wouldn't be able to close my indices on AWS ES, nor have I found anyone else who has this problem.
It's also a bit strange that I've got an ElasticSearch domain, but it's giving me a CloudSearch error message, since I thought those were different services, though I suppose one is implemented in terms of the other.
thanks!
AWS Elasticsearch does not support the "close" operation on indexes.
http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-managedomains.html
"Currently, Amazon ES does not support the Elasticsearch _close API"
According to the AWS document I found recently, you have to first upgrade your elastic search domain to version 7.4 or greater.
https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/aes-handling-errors.html#aes-troubleshooting-close-api
Since closing all indices at once is a dangerous action, it is maybe disabled by default on your cluster. You need to make sure that your elasticsearch.yml configuration file doesn't contain this:
action.destructive_requires_name: true
You can either set this in your configuration file and restart your cluster, but I strongly advise against that since this opens the door to all kinds of other destructive actions, like deleting all your indices at once.
action.destructive_requires_name: false
What you should do instead is to temporarily update the cluster settings using
curl -XPUT localhost:9200/_cluster/settings -d '{
"persistent" : {
"action.destructive_requires_name" : false
}
}'
Then close all your indices
curl -XPOST localhost:9200/_all/_close
And then reset the settings to a safer value:
curl -XPUT localhost:9200/_cluster/settings -d '{
"persistent" : {
"action.destructive_requires_name" : true
}
}'