I am new to AWS, trying to run an AWS DATA Pipeline by loading data from DynamoDB to S3. But i am getting below error. Please help
Unable to create resource for #EmrClusterForBackup_2020-05-01T14:18:47 due to: Instance type 'm3.xlarge' is not supported. (Service: AmazonElasticMapReduce; Status Code: 400; Error Code: ValidationException; Request ID: 3bd57023-95e4-4d0a-a810-e7ba9cdc3712)
I was facing the same problem when I have dynamoDB table and s3 bucket created in us-east-2 region and pipeline in us-east-1 as I was not allowed to create pipeline in us-east-2 region.
But, once I created dynamoDB table and s3 bucket created in us-east-1 region and then pipeline also in the same region, it worked well even with m3.xlarge instance type.
It is always good to use latest generation instances. They are technologically more advanced and some times even cheaper.
So there is no reason to start on older generations.. They are there only for people who are already having infrastructure on those machines.. so to provide backward compatibility.
I think this should help you. AWS will force you to use m3 if you use DynamoDBDataNode or resizeClusterBeforeRunning
https://aws.amazon.com/premiumsupport/knowledge-center/datapipeline-override-instance-type/?nc1=h_ls
I faced the same error but just changing from m3.xlarge to m4.xlarge didn't solve the problem. The DynamoDB table I was trying to export was in eu-west-2 but at the time of writing Data Pipeline is not available in eu-west-2. I found I had to edit the pipeline to change the following:
Instance type from m3.xlarge to m4.xlarge
Release Label from emr-5.23.0 to emr-5.24.0 not strictly necessary for export but required for import [1]
Hardcode the region to eu-west-2
So the end result was:
[1] From: https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb-prereq.html
On-Demand Capacity works only with EMR 5.24.0 or later
DynamoDB tables configured for On-Demand Capacity are supported only when using Amazon EMR release version 5.24.0 or later. When you use a template to create a pipeline for DynamoDB, choose Edit in Architect and then choose Resources to configure the Amazon EMR cluster that AWS Data Pipeline provisions. For Release label, choose emr-5.24.0 or later.
Related
I've prepared data pipeline (in eu-west-1 region) based on template that would export dynamodb table to S3. My table is located in eu-north-1 region, but when putting that under the parameter myDDBRegion I get the following error:
Object:EmrClusterForBackup
ERROR: 'region' has invalid value 'eu-north-1' It can have only one of the following values: [ap-south-1, eu-west-3, eu-west-2, eu-west-1, ap-northeast-2, ap-northeast-1, sa-east-1, ca-central-1, ap-southeast-1, ap-southeast-2, us-iso-east-1, eu-central-1, us-east-1, us-east-2, us-west-1, us-west-2]
Is this a non supported region to export data from ? If so, how do I accomplish exporting data from this table ?
Data Pipeline mahy internally use EMR to convert data and store to target data store
And as per this documentation, eu-north-1 is not supported for data pipeline as it can't launch EMR/EC2 though eu-north-1 actually supports EMR service
Redshift CFN resource AWS::Redshift::Cluster has two properties for configuring cross region snapshot copy: DestinationRegion and SnapshotCopyGrantName. But after creating the stack with those parameters I see the cross region snapshot copy is still disabled. Am I missing something?
This is not yet supported, but its already on the roadmap: AWS::Redshift::SnapshotConfiguration.
For now you would have to use custom resource in the form of a lambda function which would use AWS SDK's enable_snapshot_copy to enable the snapshot copies. You could also use AWSUtility::CloudFormation::CommandRunner to enable that with AWS CLI.
Problem:
I have an EMR cluster (along with a number of other resources) defined in a cloudformation template. I use the AWS rest api to provision my stack. It works, I can provision the stack successfully.
Then, I made one change: I specified a custom AMI for my EMR cluster. And now the EMR provisioning fails when I provision my stack.
And now my stack creation fails, due to EMR provisioning failing. The only information I can find is an error on the console: null: Error provisioning instances.. Digging into each instance, I see that the master node failed with error Status: Terminated. Last state change reason:Time out occurred during bootstrap
I have s3 logging configured for my EMR cluster, but there are no logs in the s3 bucket.
Details:
I updated my cloudformation script like so:
my_stack.cfn.yaml:
rMyEmrCluster:
Type: AWS::EMR::Cluster
...
Properties:
...
CustomAmiId: "ami-xxxxxx" # <-- I added this
Custom AMI details:
I am adding a custom AMI because I need to encrypt the root EBS volume on all of my nodes. (This is required per documentation)
The steps I took to create my custom AMI:
I launched the base AMI that is used by AWS for EMR nodes: emr 5.7.0-ami-roller-27 hvm ebs (ID: ami-8a5cb8f3)
I created an image from my running instance
I created a copy of this image, with EBS root volume encryption enabled. I use the default encryption key. (I must create my own base image from a running instance, because you are not allowed to create an encrypted copy from an AMI you don't own)
I wonder if this might be a permissions issue, or perhaps my AMI is misconfigured in some way. But it would be prudent for me to find some logs first, to figure out exactly what is going wrong with node provisioning.
I feel stupid. I accidentally used a completely un-related AMI (a redhat 7 image) as the base image, instead of the AMI that EMR uses for it's nodes by default: emr 5.7.0-ami-roller-27 hvm ebs (ami-8a5cb8f3)
I'll leave this question and answer up in case someone else makes the same mistake.
Make sure you create your custom AMI from the correct base AMI: emr 5.7.0-ami-roller-27 hvm ebs (ami-8a5cb8f3)
You mention that you created your custom AMI based on an EMR AMI. However, according to the documentation you linked, you should actually base your AMI on "the most recent EBS-backed Amazon Linux AMI". Your custom AMI does not need to be based on an EMR AMI, and indeed I suppose that doing so could cause some problems (though I have not tried it myself).
I am working on EMR template with autoscaling.
While a static EMR setup with instance group works fine, I cannot attach
AWS::ApplicationAutoScaling::ScalableTarget
As a troubleshooting I've split my template into 2 separate ones. In first I am creating a normal EMR cluster (which is fine). And then in second I have a ScalableTarget definition which fails attach with error:
11:29:34 UTC+0100 CREATE_FAILED AWS::ApplicationAutoScaling::ScalableTarget AutoscalingTarget EMR instance group doesn't exist: Failed to find Cluster XXXXXXX
Funny thing is that this cluster DOES exist.
I also had a look at IAM roles but everything seems to be ok there...
Can anyone advice on that matter?
Did anyone for Autoscaling instancegroup to work via Cloudformation?
I have already tried and raised a request with AWS. This autoscaling feature is not yet available using CloudFormation. Now I am using CF for Custom EMR SecGrp creation and S3 etc and in output tab, I am adding Command line command(aws emr create-cluster...... ). After getting output querying the result to launch Cluster.
Actually, autoscaling can be enabled at the time of cluster launching by using --auto-scaling-role. If we use CF for EMR, autoscaling feature is not available because it launches cluster without "--auto-scaling-role".
I hope this can be useful...
Please can someone help? I'm trying to do exactly this; I cannot create an EMR environment with a Spark installation from within a Data Pipeline configuration from within the AWS console. I choose 'Run job on an EMR cluster', the EMR cluster is always created with Pig and Hive as default, not Spark.
I understand that I can choose Spark as a bootstrap action as said here, but when I do I get this message:
Name: xxx.xxxxxxx.processing.dp
Build using a template: Run job on an Elastic MapReduce cluster
Parameters:
EC2 key pair (optional): xxx_xxxxxxx_emr_key
EMR step(s):
spark-submit --deploy-mode cluster s3://xxx.xxxxxxx.scripts.bucket/CSV2Parquet.py s3://xxx.xxxxxxx.scripts.bucket/
EMR Release Label: emr-4.3.0
Bootstrap action(s) (optional): s3://support.elasticmapreduce/spark/install-spark,-v,1.4.0.b
Where does the AMI bit go? And does the above look correct??
Here's the error I get when I activate the data pipeline:
Unable to create resource for #EmrClusterObj_2017-01-13T09:00:07 due to: The supplied bootstrap action(s): 'bootstrap-action.6255c495-578a-441a-9d05-d03981fc460d' are not supported by release 'emr-4.3.0'. (Service: AmazonElasticMapReduce; Status Code: 400; Error Code: ValidationException; Request ID: b1b81565-d96e-11e6-bbd2-33fb57aa2526)
If I specify a later version of the EMR, do I get Spark installed as default?
Many thanks for any help here.
Regards.
That install-spark bootstrap action is only for 3.x AMI versions. If you are using a releaseLabel (emr-4.x or beyond), the applications to install are specified in a different way.
I myself have never used Data Pipeline, but I see that if, when you are creating a pipeline, you click "Edit in Architect" at the bottom, you can then click on the EmrCluster node and select Applications from the "Add an optional field..." dropdown. That is where you may add Spark.