EBS volumes on EMR cluster with CloudFormation - amazon-web-services

I'm trying to launch an EMR cluster using AWS CloudFormation. I'd like to add EBS volumes to my core instances, however neither the AWS::EMR::Cluster nor the AWS::EMR::InstanceGroupConfig resource types mention anything about EBS volumes. I see you can attach EBS volumes via the API, but CloudFormation will not accept these settings.
Is this possible to do via CloudFormation?

The ability to launch Amazon EMR clusters with attached EBS volumes was introduced in February 2016 (a month prior to this question being posted).
It is likely that CloudFormation has not yet been updated to enable this additional configuration. It is quite common for CloudFormation to lag behind new feature releases.
When available, the configuration will likely be added to the Amazon Elastic MapReduce Cluster JobFlowInstancesConfig InstanceGroupConfig.

Related

How do I create a EBS snapshot without downtime to the instance?

Is it possible to create a snapshot of an EBS Volume without downtime via Terraform?
I am currently looking at documentation about resource aws_dlm_lifecycle_policy.
(https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/dlm_lifecycle_policy)
Seems like it is possible. DLM lifecycle policy just automates snapshot creation.
Snapshots are created asynchronously, even from an attached volume which is in use. Consult with official AWS documentation here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-creating-snapshot.html

Enable AWS RDS Auto Scaling via Cloudformation

I have RDS instances running in my AWS account created via a Cloudformation template. Recently there was a storage-full occurrence and as immediate remediation, I modified storage size from default 20 GB to 50 GB from the console.
Now I am considering modifying my CFN template so that RDS auto-scaling is enabled. But as I went through the AWS documentation, it says auto-scaling can be enabled through CLI, RDS API, and Console, and no mention about Cloudformation?
Is there any way to enable auto-scaling via Cloudformation?
There is not direct option for that, as explained in the following GitHub issue:
AWS::RDS::DBInstance (add Storage Auto Scaling)
However, it seems that if you set MaxAllocatedStorage, the storage autoscaling will get enabled.

Hardening AWS EC2 Instances

I have launched and AWS ECS cluster with 4 EC2 instances with ecs optimized AMI 2 years ago, the system was working fine but due to systems hardening compliance , I need to update my ECS cluster EC2 instances with latest ECS optimized AMI.
I can take latest AMI and update the instances but how can I automate this process continously, lets say for every 3 months, My autoscaling group should update the instances with latest ECS optimized AMI release by amazon.
My EC2 instances are in autoscaling group, what automation ideas I can implement here.
any AWS doc or github repo link to achieve this also will be very helpful.
Thanks in Advance
Step 1: You can use latest ami ids from AWS System Manager's paramstore and set up notifications when it is changed using EventBridge
Step 2: Write a lamba to update your launch config which has ami ids

Can we use Pre configured AMI to create AWS EMR cluster?

I am not good at writing a shell script/Bootstrap action for EMR. Can I able to use a preconfigured AMI snapshot for creating the cluster?
You can use a preconfigured AMI for EMR. However, there are some restrictions in that you must start with a supported EMR AMI. I have done this many times to create encrypted root volumes for EMR (copying the AMI and enabling encryption).
Amazon EMR now supports launching clusters with custom Amazon Linux AMIs

Autoscaling a running Hadoop cluster setup on AWS EC2

My goal is to understand how can I auto-scale a Hadoop cluster on AWS EC2.
I am exploring AWS offerings from elastic scaling perspective for a Hadoop as service (EMR) and Hadoop on EC2.
For EMR, I gathered that using CloudWatch, performance metrics can be monitored and the user can be alerted once they reach the set threshold, thereafter the cluster can be scaled up or down depending on its utilization state.
This approach would require some custom implementation to automate the steps.(correct me if I am missing anything here)
For Hadoop on EC2, I came across with the auto scaling option which can add or remove instances as per configured scaling policies.
But I am not clear how a newly added node would get bootstrapped to the cluster automatically? How would YARN know that it can spawn a new container on this newly added node?
Does auto-scaling work for master-slave kind of setup as well or is limited to the web application?
There is 'Qubole' offering services to manage Hadoop on AWS as well....should that be used for automatically managing scaling the cluster?