ElasticSearch massive purge of deleted documents reasons? (AWS ES service) - amazon-web-services

Is there any info when and why ES may trigger automatically purge of documents marked for deletion?
Where can be found logs with possible info about trigger?
The service in question is actually AWS ES, but I do not think it is related to the topic.. may be I'm wrong?
The version in question is ElasticSearch 5.1

when the merge happens, the marked document will be purged.
there are some merge policies that indicate when the merge process triggered. for example, the number of segment's files is more than 300 or the marked document is more than 15% of a segment.
there is some information here for elasticsearch 1.4:
https://www.elastic.co/guide/en/elasticsearch/reference/1.4/index-modules-merge.html
It seems that the developers don't want to clarify the policies anymore.
this is an example of merge policy settings:
"merge": {
"scheduler": {
"max_thread_count": "1",
"auto_throttle": "true",
"max_merge_count": "6"
},
"policy": {
"reclaim_deletes_weight": "2.0",
"floor_segment": "2mb",
"max_merge_at_once_explicit": "30",
"max_merge_at_once": "10",
"max_merged_segment": "5gb",
"expunge_deletes_allowed": "10.0",
"segments_per_tier": "10.0",
"deletes_pct_allowed": "33.0"
}
for logging a merge process I think you should change logs level to INFO or DEBUG. (log4j settings)

Related

google cloud platform -- creating alert policy -- how to specify message variable in alerting documentation markdown?

So I've created a logging alert policy on google cloud that monitors the project's logs and sends an alert if it finds a log that matches a certain query. This is all good and fine, but whenever it does send an email alert, it's barebones. I am unable to include anything useful in the email alert such as the actual message, the user must instead click on "View incident" and go to the specified timeframe of when the alert happened.
Is there no way to include the message? As far as I can tell viewing the gcp Using Markdown and variables in documentation templates doc on this.
I'm only really able to use ${resource.label.x} which isn't really all that useful because it already includes most of that stuff by default in the alert.
Could I have something like ${jsonPayload.message}? It didn't work when I tried it.
Probably (!) not.
To be clear, the alerting policies track metrics (not logs) and you've created a log-based metric that you're using as the basis for an alert.
There's information loss between the underlying log (that contains e.g. jsonPayload) and the metric that's produced from it (which probably does not). You can create Log-based metrics labels using expressions that include the underlying log entry fields.
However, per the example in Google's docs, you'd want to consider a limited (enum) type for these values (e.g. HTTP status although that may be too broad too) rather than a potentially infinite jsonPayload.
It is possible. Suppose you need to pass "jsonPayload.message" present in your GCP log to documentation section in your policy. You need to use "label_extractor" feature to extract your log message.
I will share a policy creation JSON file template wherein you can pass "jsonPayload.message" in the documentation section in your policy.
policy_json = {
"display_name": "<policy_name>",
"documentation": {
"content": "I have the extracted the log message:${log.extracted_label.msg}",
"mime_type": "text/markdown"
},
"user_labels": {},
"conditions": [
{
"display_name": "<condition_name>",
"condition_matched_log": {
"filter": "<filter_condition>",
"label_extractors": {
"msg": "EXTRACT(jsonPayload.message)"
}
}
}
],
"alert_strategy": {
"notification_rate_limit": {
"period": "300s"
},
"auto_close": "604800s"
},
"combiner": "OR",
"enabled": True,
"notification_channels": [
"<notification_channel>"
]
}

AWS ElasticSearch, Filebeat, Kibana: Indices managed by policy automatically set to NO even after applying policy

ES version: 7.9
Hello friends,
I am working on AWS Elasticsearch, we are currently pushing our logs to Kibana via filebeat. To avoid ES space getting filled, we have a lifecyle policy which deletes logs more than 10 days. This has worked for us since 1 month,
but now when I checked the indices, it says the index is not being managed by any policy. No one has changed that setting amongst us. What has caused this change? Any ideas?
My suspect is ilm being set to false via filebeat, but i want to be sure.
We are using following configuration via filebeat:
output.elasticsearch:
hosts: [\"$filebeat_host\"]
protocol: \"https\"
output.elasticsearch.index: \"filbeat-${TIER_NAME}\"
setup.template.name: \"filebeat-${TIER_NAME}\"
setup.template.pattern: \"filebeat-${TIER_NAME}\"
setup.ilm.enabled: false
setup.pack.security.enabled: false
setup.xpack.graph.enabled: false
setup.xpack.watcher.enabled: false
setup.xpack.monitoring.enabled: false
setup.xpack.reporting.enabled: false
Any ideas. Thanks a lot. :-)
Since you are using AWS Elasticsearch, ILM settings wont help as that is related to Elastic's xpack feature.
Can you check if ISM (Index State Management) settings are correctly configured for your indices?
Since you mentioned that it worked for about a month, its possible that new the indices which were created do not have the ISM policy attached to them and hence aren't getting deleted.
Here is how to apply ISM to new indices using index templates - AWS ElasticSearch: How to apply a policy to an index
To make sure ILM is applied to your index, you can use templates:
POST _template/templateName
{
"order": 0,
"index_patterns": [
"your-pattern-here*"
],
"settings": {
"index": {
"number_of_shards": "TODO",
"number_of_replicas": "TODO",
"opendistro": {
"index_state_management": {
"policy_id": "your policy id"
}
}
}
}
}
Simply insert your pattern to match your indicies as well as the id of your index policy, as well as the shards and replicas you want. I find it easier to manage indicies in elasticsearch itself as you then have a central location where you manage the life cycle of your logs.

Why is my S3 lifecycle policy not taking effect?

I have an S3 lifecycle policy to delete objects after 3 days, and I am using a prefix. My problem is that the policy works for all but one sub-directory. For example, lets say my bucket looks like this:
s3://my-bucket/myPrefix/env=dev/
s3://my-bucket/myPrefix/env=stg/
s3://my-bucket/myPrefix/env=prod/
When I check the stg and prod directories, there are no objects older than 3 days. However, when I check the dev directory, there are objects a lot older than that.
Note - There is a huge difference between the volume of data in dev compared to the other 2. Dev holds a lot more logs than the others.
My initial thought was that it was taking longer for Eventual Consistency to show what was deleted and what wasn't, but that theory is gone considering the time that has passed.
The issue seems related to the amount of data in this location under the prefix compared to the others, but I'm not sure what I can do to resolve this. Should I have another policy specific to this location, or is there a somewhere I can check to see what is causing the failure? I did not see anything in Cloudtrail for this event.
Here is my policy:
{
"Rules": [
{
"Expiration": {
"Days": 3
},
"ID": "Delete Object When Stale",
"Prefix": "myPrefix/",
"Status": "Enabled"
}
]
}

Set 'maxActiveInstances' error

I am using AWS data-pipeline to export a DDB table, but when I activate I get an error:
Web service limit exceeded: Exceeded number of concurrent executions. Please set the field 'maxActiveInstances' to a higher value in your pipeline or wait for the currenly running executions to complete before trying again (Service: DataPipeline; Status Code: 400; Error Code: InvalidRequestException; Request ID: efbf9847-49fb-11e8-abef-1da37c3550b5)
How do I set this maxActiveInstances property using the AWS UI?
You can set it as a property on your Ec2Resource[1] (or EmrActivity[2]) object. Using the UI, click Edit Pipeline, click on Resources on the right hand side of the screen (it's a collapsable menu). There should be an Ec2Resource object. There should be a drop down on this object called "Add an additional field" and you should see max active instances in the drop down.
[1]https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-ec2resource.html
[2] https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-emractivity.html
We ran into this too. For an on-demand pipeline, it looks like after a certain number of retries, you have to give it time to finish terminating the provisioned resources before you will be allowed to try again.
Solution: Patience.
With an on-demand pipline you can specify it in the 'Default object', like this
{
"objects": [
{
"failureAndRerunMode": "CASCADE",
"scheduleType": "ONDEMAND",
"name": "Default",
"id": "Default",
"maxActiveInstances": "5"
},
...
I couldn't add it in Architect, I had to create another pipeline from the json. But once that was done I could edit it in Architect (under 'Others' section).

Elasticsearch Snapshot Fails With RepositoryMissingException

Three node ElasticSearch cluster on AWS. Bigdesk and Head both show a healthy cluster. All three nodes are running ES 1.3, and the latest Amazon Linux updates. When I fire off a snapshot request like:
http://localhost:9200/_snapshot/taxanalyst/201409031540-snapshot?wait_for_completion=true
the server churns away for several minutes before responding with the following:
{
"snapshot": {
"snapshot": "201409031521-snapshot",
"indices": [
"docs",
"pdflog"
],
"state": "PARTIAL",
"start_time": "2014-09-03T19:21:36.034Z",
"start_time_in_millis": 1409772096034,
"end_time": "2014-09-03T19:28:48.685Z",
"end_time_in_millis": 1409772528685,
"duration_in_millis": 432651,
"failures": [
{
"node_id": "ikauhFYEQ02Mca8fd1E4jA",
"index": "pdflog",
"reason": "RepositoryMissingException[[faxmanalips] missing]",
"shard_id": 0,
"status": "INTERNAL_SERVER_ERROR"
}
],
"shards": {
"total": 10,
"failed": 1,
"successful": 9
}
}
}
These are three nodes on three different virtual EC2 machines, but they're able to communicate via 9300/9200 without any problems. Indexing and searching works as expected. There doesn't appear to be anything in the elasticsearch log files that speaks to the server error.
Does anyone know what's going on here, or at least where a good place to start would be?
UPDATE: Turns out that each of the nodes in the cluster need to have snapshot directories that match the directory specified when you register the snapshot with the elasticsearch cluster.
I guess the next question is: when you want to tgz up the snapshot directory so you can archive it, or provision a backup cluster, is it sufficient to just tgz the snapshot directory on the Master node? Or do you have to somehow consolidate the snapshot directories of all the nodes. (That can't be right, can it?)
Elasticsearch supports shared file system repository uses the shared file system to store snapshots.
In order to register the shared file system repository it is necessary to mount the same shared filesystem to the same location on all master and data nodes.
All you need to know Put in elasticsearch.yml of all 3 nodes same repository_name.
for eg:-
path.repo:[/my_repository]
I think you are looking for this aws plugin for elasticsearch (I guess you already installed it to configure your cluster) : https://github.com/elasticsearch/elasticsearch-cloud-aws#s3-repository
It will allow you to create a repository mapped to a S3 bucket.
To use (create/restore/whatever) a snapshot, you need to create a repository first. Then when you will do some actions on a snapshot, Elasticsearch will directly manage it on your S3 bucket.