AWS ElasticSearch - Automating manual snapshots - amazon-web-services

The requirement - A customer requires an automated mechanism that takes a manual snapshot of an AWS ElasticSearch domain (production) on a daily basis. The target of the snapshot is an AWS S3 bucket.
Expected flow
Schedule Daily # 2am --> start process --> take snapshot --> wait 5 min --> check snapshot status (success/in_progress/failed)
if state==IN_PROGRESS, check snapshot status again, up to 10 times, interval of 5 mins
state==SUCCESS - end process (success)
state==IN_PROGRESS - when reaching 10 retries (50 mins), end process (failed)
state==FAILED - end process (failed)
If previous step failed, send push notification (Slack/Teams/Email/etc.)
Motivation - The automated snapshots that are taken by AWS can be used for disaster recovery or a failure in an upgrade, they cannot be used if someone by accident (yes, it happened) deleted the whole ElasticSearch cluster.
Haven't found an out-of-the-box Lambda/mechanism that meets the requirements. Suggestions? Thoughts?
p.s- I did a POC with AWS Step Functions + Lambda in VPC, which seems to be working, but I'd rather use a managed service or a living open-source project.

In case you accidentally delete your AWS Elasticsearch domain, AWS Support can help you recover the domain along with its latest snapshot on best effort basis. This is not listed in the documentation since this shouldn't ideally be your first bet.
Assuming this will be rare scenario, you should be fine. However, if you think there are fair chances of your AWS ES cluster getting delete again and again, you will be better off setting up a lambda function to save a latest snapshot in your own S3 bucket. This will save you from depending on AWS support as well.

AWS Elasticsearch have accidental delete protection. Incase you delete your domain by mistake, AWS elasticsearch can recover it within 14 days.
Hope this solves your purpose.

Related

Writing data from AWS RDS SQL Server to S3 bucket

According to this: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/User.SQLServer.Options.S3-integration.html we should be able to write a file from RDS to S3. But when we try it fails with
blocked because RDS is a managed service with SLA and guard rails to help deliver it.
Anyone know a way around this?
Seems like running your task will interfere with SLA's, so it would be directly related to the Note mentioned in the link you shared:
Note: S3 integration tasks share the same queue as native backup and
restore tasks. At maximum, you can have only two tasks in progress at
any time in this queue. Therefore, two running native backup and
restore tasks will block any S3 integration tasks.
There must be some automated backup activity going on when you run your task.

Amazon Personalize - how to delete Batch Inference Job "Create in progres"?

I started with Amazon Personalize yesterday with the help of this tutorial. Since it took more time than expected, in the middle of the notebook I decided to postpone it and deleted all resources (Cloud Formation stack, Jupyter notebook, S3 Bucket). Evidently, something went wrong. I still have a Dataset Group with status 'Active'.
I cannot delete it, because there is one Batch Inference job with status 'Create in progress'. It has this status since yesterday, now for more than 12 hours.
How can I delete all of this? What charges should I expect?
There is no option to stop 'Create in progress' for any of the Personalize resources. This is the downside of having 'black box' service.
I believe the best option would be to contact AWS Support team and they should be able to terminate it manually.
I had some cases, when some of resources creation were taking more than 12 hours and the time depends mostly on dataset size and type of the job. There is no other option than waiting for it to complete, if you don't want to contact the support.

AWS Glue request limit

Have some lambdas that request schemas form AWS Glue. Would like to know if there is a limit of requests to AWS Glue after which Glue cannot handle it? Load testing in other words.
Have not found anything about it in official documentation.
Thanks
The various default, per-region limits for the AWS Glue service are listed at the below link. You can request increases to these limits via the support console.
https://docs.aws.amazon.com/glue/latest/dg/troubleshooting-service-limits.html
These limits are not a guaranteed capacity unless there is an SLA defined for the service, which I don't think Glue has. One would assume that EC2 is the backing service though so capacity should theoretically not be an issue. You will only know by running your workflow over a long period of time to see the true availability of the service if there is no SLA.
have a look here:
https://docs.aws.amazon.com/general/latest/gr/glue.html
as of today (2020/01/27):
Number of jobs per trigger 50

Kms request automatically increasing on AWS

KMS requests are continuously increasing on my AWS account. I am on Free Tier package. My monthly quota is 20,000 requests, but in first 7 days, I've used 45 % of it (9000 requests).
Please tell me how to control this number I have no instance running at the time still the requests are increasing. No instance, no KMS keys, no web apps, no deployments, and I don't know why this is happening to me. I tried a lot to search on Google but couldn't find anything helpful.
EDIT:
First I created an instance and deployed a Django project. After 3 days I terminated that instance. Now I have no services running. In last 2 days, KMS requests count has been increased by 10%.
KMS is used by a number of other AWS services, and there is also a default key. Some examples of where this can be used:
Encrypting data of any type
AWS Certification Manager SSL certs in an ELB/CloudFront
As for encryption, there's encrypted EBS volumes, RedShift data, S3 bucket data, parameters in EC2 Parameter Store, etc. If you still haven't got any idea what is causing the KMS allocation hits you might want to use CloudTrail to log calls. Note that CloudTrail itself can encrypt data and essentially kill your KMS allocations, and the logs it stores in S3 count against your S3 allocations.
After trying a lot I finally sorted out this by my self.
the problem was a remaining S3 bucket, after deleting those the KMS requests stopped increasing.

Delete AWS codeDeploy Revisions from S3 after successfull deployment

I am using codeDeploy addon for bitbucket to deploy my codes directly from Bitbucket Git repository to my EC2 instances via AWS codeDeploy. However, after a while, I have a lot of revisions in my codeDeploy console which were stored in one S3 bucket. So what should I do to save my S3 storage from keeping old codeDeploy revisions?
Is it possible to delete these revisions automatically after a successful deployment?
Is it possible to delete them automatically if there is X number of successful revision? For example, delete an old revision if we have three new successful revisions.
CodeDeploy keeps every revision from BitBucket is because, the service needs last successful revision all the time for different kinds of features like AutoRollback. So we can't easily override the previous revision for now, when doing a deployment. But for all revisions older than last successful revision, they can be deleted.
Unfortunately, CodeDeploy doesn't have a good/elegant way to handle those obsolete revisions at the moment. It'd be great if there is an overwrite option when bitbucket pushes to S3.
CodeDeploy is purely a deployment tool, it cannot handle the revisions in S3 bucket.
I would recommend you look into the "lifecycle management" for S3. Since you are using version controlled bucket (I assume), there is always one latest version and 0 to many obsolete version. You can set a lifecycle configuration of type "NoncurrentVersionExpiration" so that the obsolete version will be deleted after some days.
This method is still not possible to maintain a fixed number of deployments as AWS only allows specifying lifecycle in number of days. But it's probably the best alternative to your use-case.
[1] http://docs.aws.amazon.com/AmazonS3/latest/dev/how-to-set-lifecycle-configuration-intro.html
[2] http://docs.aws.amazon.com/AmazonS3/latest/dev/intro-lifecycle-rules.html
CodeDeploy does not handle a feature like Jenkins sample: "keep the last X [successful or not] runs".
However, with S3 Lifecycle, you can expire (delete) the S3 objects automatically after 3 months for sample.
On one hand this solution is a nice FinOps action when there is a constant activity during the expiration window (at least 3 deployments) by assuring the automatic rollback process of CodeDeploy and reducing the S3 cost.
On the other hand this solution is less efficient when you have spiky activities or worse no deployment at all during the specified S3 expiration delay: in the case of the deployment 12 months after the last one, when this deployment fails, Code Deploy will not be able to proceed to the rollback since the previous artifacts are no more available in S3.
As mitigation, I recommand you to use the Intelligent Tiering it can divide the S3 cost 4 without interferring with the CodeDeploy capabilities. Also you can set a expiration to 12 months to delete the ancient artifacts.
Another last solution is coding a Lambda scheduled by a weekly Cloudwatch Events and that will:
List deplyments using using your own critera success/fail status
Get deployment details for each
Filter out again this deployments using your cirteria (date, user, ..)
Delete the S3 objects using the deployment details