I am not good at writing a shell script/Bootstrap action for EMR. Can I able to use a preconfigured AMI snapshot for creating the cluster?
You can use a preconfigured AMI for EMR. However, there are some restrictions in that you must start with a supported EMR AMI. I have done this many times to create encrypted root volumes for EMR (copying the AMI and enabling encryption).
Amazon EMR now supports launching clusters with custom Amazon Linux AMIs
Related
I am aware of AWS cloudformation EMR resource to create Clusters. But, I could not find any instructions about EMR notebooks. Is there a cloudformation resource for EMR notebooks or similar alternative?
EMR Notebooks can only be created manually using the AWS EMR console. From the documentation (https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-create.html):
You create an EMR notebook using the Amazon EMR console. Creating notebooks using the AWS CLI or the Amazon EMR API is not supported.
Since there is no API for this I don't think there will be a way to create notebooks using CloudFormation or similar tools.
I need to create a AWS EMR cluster for spark job with one master and 4 core nodes with auto scaling. I need to have different Instance types for master and core with Ubuntu 16.0 installed on it. So do I need to create two AMI's for this master and slave.
Amazon EMR has its own library of AMIs. You can select the AMI version when launching the cluster.
You can create a custom AMI, but it must be based on Amazon Linux.
See: Using a Custom AMI - Amazon EMR
If you wish to launch a Hadoop cluster with your own Ubuntu AMI, you cannot use the Amazon EMR service. You will need to launch and configure it yourself on Amazon EC2 instances.
I am very new to cloud based services. I want to try impala queries on AWS EMR and EC2. Is it possible, Can I create a free account for EC2/EMR. If yes then how?
Impala is not available as a standard option in Amazon EMR.
You would probably need to launch your own Hadoop cluster on Amazon EC2 instances.
However, the AWS Free Usage Tier only provides micro-sized EC2 instances, which are not appropriate for a Hadoop cluster.
I am not able to find a find documents showing how can I copy a Windows instance to an Amazon S3 bucket.
Can any one help me with step by step approach to do this and suggest some of the links?
You can not copy AMIs to s3. You can either create a snapshot of your volumes or create another image (AMI).
I assume you're trying to create a backup of your AMIs. So, there are some alternatives for doing that.
Create a new AMI from an existing running image. Reference: Creating an Amazon EBS-Backed Windows AMI
Creating a Windows AMI from a Running Instance
You can create an AMI using the AWS Management Console or the command line. The following diagram summarizes the process for creating an Amazon EBS-backed AMI from a running EC2 instance. Start with an existing AMI, launch an instance, customize it, create a new AMI from it, and finally launch an instance of your new AMI. The steps in the following diagram match the steps in the procedure below. If you already have a running Amazon EBS-backed instance, you can go directly to step 4.
You can create images using the AWS CLI command create-image
Create Snapshots of your volumes, these snapshots will be stored behind the scenes in s3. Reference: Creating an Amazon EBS Snapshot
You can create EBS snapshots using the AWS CLI command create-snapshot
+ Resources
Copying an Amazon EBS Snapshot
Copying an AMI
I'm trying to launch an EMR cluster using AWS CloudFormation. I'd like to add EBS volumes to my core instances, however neither the AWS::EMR::Cluster nor the AWS::EMR::InstanceGroupConfig resource types mention anything about EBS volumes. I see you can attach EBS volumes via the API, but CloudFormation will not accept these settings.
Is this possible to do via CloudFormation?
The ability to launch Amazon EMR clusters with attached EBS volumes was introduced in February 2016 (a month prior to this question being posted).
It is likely that CloudFormation has not yet been updated to enable this additional configuration. It is quite common for CloudFormation to lag behind new feature releases.
When available, the configuration will likely be added to the Amazon Elastic MapReduce Cluster JobFlowInstancesConfig InstanceGroupConfig.