AWS EMR cluster stucked starting with custom AMI - amazon-web-services

I'm trying to run a EMR cluster with custom AMI because the bootstrap time take 13 minutes. I have created an AMI from a m5.large instances with all of the software installed. This instances is based on amazon linux 2 AMI. I have created the image (clicking in action, image, create image). When I run the cluster it starts the instances on EC2 but the cluster keeps on starting.
How can I solve the problem?

Related

Hardening AWS EC2 Instances

I have launched and AWS ECS cluster with 4 EC2 instances with ecs optimized AMI 2 years ago, the system was working fine but due to systems hardening compliance , I need to update my ECS cluster EC2 instances with latest ECS optimized AMI.
I can take latest AMI and update the instances but how can I automate this process continously, lets say for every 3 months, My autoscaling group should update the instances with latest ECS optimized AMI release by amazon.
My EC2 instances are in autoscaling group, what automation ideas I can implement here.
any AWS doc or github repo link to achieve this also will be very helpful.
Thanks in Advance
Step 1: You can use latest ami ids from AWS System Manager's paramstore and set up notifications when it is changed using EventBridge
Step 2: Write a lamba to update your launch config which has ami ids

ECS - User Data For EC2 instances

I am trying to create a docker image based on httpd with a custom information about the docker image. So for that am trying to set the ECS_ENABLE_CONTAINER_METADATA=true in /etc/ecs/ecs.config.
I am trying to do it in the user data of the ecs instance. First thing i noticed is there is no provision to specify the user data while creating the cluster.
Then tried copying the launch configuration and edited the user data per below stackoverflow,
ECS, how to add user-data after creating ecs instance
But when i try to run tasks, I found that no ecs instance is linked with the cluster.
Any suggestions if you had run to similar issue ?
It seems that the ECS instance is not registered with the cluster. You need to ensure that the AMIs you use to create the ECS instance has the ECS agent enabled and running. The full list of AMIs is available in the ECS developer docs under container instances.

Do we need to create two AMI's for master and core in EMR?

I need to create a AWS EMR cluster for spark job with one master and 4 core nodes with auto scaling. I need to have different Instance types for master and core with Ubuntu 16.0 installed on it. So do I need to create two AMI's for this master and slave.
Amazon EMR has its own library of AMIs. You can select the AMI version when launching the cluster.
You can create a custom AMI, but it must be based on Amazon Linux.
See: Using a Custom AMI - Amazon EMR
If you wish to launch a Hadoop cluster with your own Ubuntu AMI, you cannot use the Amazon EMR service. You will need to launch and configure it yourself on Amazon EC2 instances.

For AWS ASG, how to set up custom readiness check for new instances?

We have an AutoScaling Group that runs containers (using ECS). When we add OR replace EC2 instances in the ASG, they don't have the docker images we want on them. So, we run a few docker pull commands using cloud-init to fetch the images when they boot up.
However, the ASG thinks that the new instance is ready, and terminates an old instance. But in reality, this new instance isn't ready until all docker images have been pulled.
E.g.
Let's say my ASG's desired count is 5, and I have to pull 5 containers using cloud-init. Now, I want to replace all EC2 instances in my ASG.
As new instances start to boot up, ASG will terminate old instances. But due to the docker pull lag, there will be a time during the deploy, when the actual valid instances will be lesser than 3 or 2.
How can I "mark an instance ready" only when cloud-init is finished?
Note:I think Cloudformation can bridge this communication gap using CFN-Bootstrap. But I'm not using Cloudformation.
What you're look got is AutoScaling Lifecycle Hooks. You can keep an instance in the Pending:Wait state until you're docker pull has completed. You can then move the instance to InService. all of this can be done with the AWS CLI so it should be achievable with an AWS AutoScaling command before and after your docker commands.
The link to the documentation I have provided explains this feature in detail and provides great examples on how to use it.

How to submit emr jobs from separate ec2 linux

I would like to know how to execute emr jobs & redshift copy commands from a separate ec2 linux instance?
Configuration:
2 eme ec2 running (one for master & one for Task)
1 ec2 linux instance
1 ec2 windows
I am able to execute jobs (s3 file movement, emr script residing in s3, Hard code copy commands to load into redshift]from ec2 windows (donwloaded aws cli).
Regards
Sanjeeb
You can use the EMR CLI to submit a step.
Here's more documentation from the EMR docs on how to do this.