Elasticsearch Snapshot Fails With RepositoryMissingException

Elasticsearch Snapshot Fails With RepositoryMissingException - amazon-web-services

Three node ElasticSearch cluster on AWS. Bigdesk and Head both show a healthy cluster. All three nodes are running ES 1.3, and the latest Amazon Linux updates. When I fire off a snapshot request like:
http://localhost:9200/_snapshot/taxanalyst/201409031540-snapshot?wait_for_completion=true
the server churns away for several minutes before responding with the following:
{
"snapshot": {
"snapshot": "201409031521-snapshot",
"indices": [
"docs",
"pdflog"
],
"state": "PARTIAL",
"start_time": "2014-09-03T19:21:36.034Z",
"start_time_in_millis": 1409772096034,
"end_time": "2014-09-03T19:28:48.685Z",
"end_time_in_millis": 1409772528685,
"duration_in_millis": 432651,
"failures": [
{
"node_id": "ikauhFYEQ02Mca8fd1E4jA",
"index": "pdflog",
"reason": "RepositoryMissingException[[faxmanalips] missing]",
"shard_id": 0,
"status": "INTERNAL_SERVER_ERROR"
}
],
"shards": {
"total": 10,
"failed": 1,
"successful": 9
}
}
}
These are three nodes on three different virtual EC2 machines, but they're able to communicate via 9300/9200 without any problems. Indexing and searching works as expected. There doesn't appear to be anything in the elasticsearch log files that speaks to the server error.
Does anyone know what's going on here, or at least where a good place to start would be?
UPDATE: Turns out that each of the nodes in the cluster need to have snapshot directories that match the directory specified when you register the snapshot with the elasticsearch cluster.
I guess the next question is: when you want to tgz up the snapshot directory so you can archive it, or provision a backup cluster, is it sufficient to just tgz the snapshot directory on the Master node? Or do you have to somehow consolidate the snapshot directories of all the nodes. (That can't be right, can it?)

Elasticsearch supports shared file system repository uses the shared file system to store snapshots.
In order to register the shared file system repository it is necessary to mount the same shared filesystem to the same location on all master and data nodes.
All you need to know Put in elasticsearch.yml of all 3 nodes same repository_name.
for eg:-
path.repo:[/my_repository]

I think you are looking for this aws plugin for elasticsearch (I guess you already installed it to configure your cluster) : https://github.com/elasticsearch/elasticsearch-cloud-aws#s3-repository
It will allow you to create a repository mapped to a S3 bucket.
To use (create/restore/whatever) a snapshot, you need to create a repository first. Then when you will do some actions on a snapshot, Elasticsearch will directly manage it on your S3 bucket.

Related

EBS Direct APIs: Starting a snapshot and putting data into it

I'm trying to (1) create a new EBS snapshot using the EBS direct APIs and (2) put data into the newly-created snapshot. I keep getting the following error at step #2:
"Error parsing parameter '--block-data': Blob values must be a path to a file."
I'm sure it has to do with the file path ( /tmp/data) in step #2, but I'm not sure what that path file should be or what exactly should be in there.
All help is appreciated. TY!
Here are my CLI commands (from the EC2 instance I'm trying to snapshot):
Start a Snapshot:
aws ebs start-snapshot --volume-size 8 --timeout 60 --client-token 550e8400-e29b-41d4-a716-446655440000
OUTPUT:
{
"Status": "pending",
"KmsKeyArn": "arn:aws:kms:us-east-1:721340000000:key/a0919dc2-5e54-4a66-b52bEXAMPLE",
"BlockSize": 524288,
"VolumeSize": 8,
"StartTime": 1663609346.678,
"SnapshotId": "snap-0d0b369bf6EXAMPLE",
"OwnerId": "7213410000000"
}
Put data into the newly-created snapshot:
aws ebs put-snapshot-block --snapshot-id snap-0d0b369bf6EXAMPLE --block-index 1000 --data-length 524288 --block-data /tmp/data --checksum UK3qYfpOd6sRG4FHFgl6v9Bfg6IHtH60Upu9TXXXXXX= --checksum-algorithm SHA256
This has been my guide: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/writesnapshots.html

linking to a google cloud bucket file in a terminal command?

I'm trying to find my way with Google Cloud.
I have a Debian VM Instance that I am running a server on. It is installed and working via SSH Connection in a browser window. The command to start the server is "./ninjamsrv config-file-path.cfg"
I have the config file in my default google firebase storage bucket as I will need to update it regularly.
I want to start the server referencing the cfg file in the bucket, e.g:
"./ninjamsrv gs://my-bucket/ninjam-config.cfg"
But the file is not found:
error opening configfile 'gs://my-bucket/ninjam-config.cfg'
Error loading config file!
However if I run:
"gsutil acl get gs://my-bucket/"
I see:
[
{
"entity": "project-editors-XXXXX",
"projectTeam": {
"projectNumber": "XXXXX",
"team": "editors"
},
"role": "OWNER"
},
{
"entity": "project-owners-XXXXX",
"projectTeam": {
"projectNumber": "XXXXX",
"team": "owners"
},
"role": "OWNER"
},
{
"entity": "project-viewers-XXXXX",
"projectTeam": {
"projectNumber": "XXXXX",
"team": "viewers"
},
"role": "READER"
}
]
Can anyone advise what I am doing wrong here? Thanks

The first thing to verify is if indeed the error thrown is a permission one. Checking the logs related to the VM’s operations will certainly provide more details in that aspect, and a 403 error code would confirm if this is a permission issue. If the VM is a Compute Engine one, you can refer to this documentation about logging.
If the error is indeed a permission one, then you should verify if the permissions for this object are set as “fine-grained” access. This would mean that each object would have its own set of permissions, regardless of the bucket-level access set. You can read more about this here. You could either change the level of access to “uniform” which would grant access to all objects in the relevant bucket, or make the appropriate permissions change for this particular object.
If the issue is not a permission one, then I would recommend trying to start the server from the same .cfg file hosted on the local directory of the VM. This might point the error at the file itself, and not its hosting on Cloud Storage. In case the server starts successfully from there, you may want to re-upload the file to GCS in case the file got corrupted during the initial upload.

ElasticSearch massive purge of deleted documents reasons? (AWS ES service)

Is there any info when and why ES may trigger automatically purge of documents marked for deletion?
Where can be found logs with possible info about trigger?
The service in question is actually AWS ES, but I do not think it is related to the topic.. may be I'm wrong?
The version in question is ElasticSearch 5.1

when the merge happens, the marked document will be purged.
there are some merge policies that indicate when the merge process triggered. for example, the number of segment's files is more than 300 or the marked document is more than 15% of a segment.
there is some information here for elasticsearch 1.4:
https://www.elastic.co/guide/en/elasticsearch/reference/1.4/index-modules-merge.html
It seems that the developers don't want to clarify the policies anymore.
this is an example of merge policy settings:
"merge": {
"scheduler": {
"max_thread_count": "1",
"auto_throttle": "true",
"max_merge_count": "6"
},
"policy": {
"reclaim_deletes_weight": "2.0",
"floor_segment": "2mb",
"max_merge_at_once_explicit": "30",
"max_merge_at_once": "10",
"max_merged_segment": "5gb",
"expunge_deletes_allowed": "10.0",
"segments_per_tier": "10.0",
"deletes_pct_allowed": "33.0"
}
for logging a merge process I think you should change logs level to INFO or DEBUG. (log4j settings)

If there are a way to get info at runtime about SparkMetrics configuration

I add metrics.properties file to resource directory (maven project) with CSV sinc. Everything is fine when I run Spark app locally - metrics appears. But when I file same fat jar to Amazon EMR I do not see any tries to put metrics into CSV sinc. So I want to check at runtime what are loaded settings for SparkMetrics subsystem. If there are any possibility to do this?
I looked into SparkEnv.get.metricsSystem but didn't find any.

That is basically because Spark on EMR is not picking up your custom metrics.properties file from the resources dir of the fat jar.
For EMR the preferred way to configure is through EMR Configurations API in which you need to pass the classification and properties in an embedded JSON.
For spark metrics subsystem here is an example to modify a couple of metrics
[
{
"Classification": "spark-metrics",
"Properties": {
"*.sink.csv.class": "org.apache.spark.metrics.sink.CsvSink",
"*.sink.csv.period": "1"
}
}
]
You can use this JSON when creating EMR cluster using Amazon Console or through SDK

What does 'Logging' do in Dockerrun.aws.json

I'm struggling to work out what the Logging tag does in the Dockerrun.aws.json file for a Single Container Docker Configuration. All the official docs say about it is Logging – Maps the log directory inside the container.
This sounds like they essentially create a volume from /var/log on the EC2 instance to a directory in the docker filesystem as specified by Logging. I have the following Dockerrun.aws.json file:
{
"AWSEBDockerrunVersion": "1",
...
"Logging": "/var/log/supervisor"
}
However, when I go to the AWS Console and request the logs for my instance, none of my custom log files located in /var/log/supervisor are in the log bundles. Can anyone explain to me what the purpose of this Logging tag is and how I may use it (or not) to retrieve my custom logs.
EDIT
Here are the Volume mappings for my container (didn't think to check that):
"Volumes": {
"/var/cache/nginx": "/var/lib/docker/vfs/dir/ff6ecc190ba3413660a946c557f14a104f26d33ecd13a1a08d079a91d2b5158e",
"/var/log/supervisor": "/var/log/eb-docker/containers/eb-current-app"
},
"VolumesRW": {
"/var/cache/nginx": true,
"/var/log/supervisor": true
}
It turns out that /var/log/supervisor is mapping to /var/log/eb-docker/containers/eb-current-app rather than /var/log as I originally suspected. It'd be nice if this was clearer in the documentation.
But it also turns out that I was running the wrong Docker Image which explains why my log files weren't appearing anywhere! Doh!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js