EMR auto terminate cluster not completing spark application?

EMR auto terminate cluster not completing spark application? - amazon-web-services

I intend to create an auto terminating EMR cluster that executes a spark cluster and shuts down.
If I submit the application as a step to an existing cluster that does not auto terminate using the following command, it works and application completes in 3 minutes.
aws emr add-steps --cluster-id xxx \
--steps Name=imdbetlapp,Jar=command-runner.jar,Args=\
[spark-submit,--deploy-mode,cluster,--master,yarn,--conf,spark.yarn.submit.waitAppCompletion=true,\
--py-files,s3://bucketname/etl_module.zip,\
--files,s3://bucketname/etl_module/aws_config.cfg,\
s3://bucketname/run_etl.py],ActionOnFailure=CONTINUE --region us-east-1
However, when i use the following command to create an auto terminating cluster with a step function, the application keeps running for more that 30 minutes.
aws emr create-cluster --applications Name=Hadoop Name=Spark --use-default-roles \
--bootstrap-actions Path=s3://bucketname/emr_bootstrap.sh,Name=installPython \
--log-uri s3://logbucketname/elasticmapreduce/ \
--configurations https://s3.amazonaws.com/bucketname/emr_configurations.json \
--steps Name=imdbetlapp,Jar=command-runner.jar,Args=[spark-submit,--deploy-mode,cluster,\
--master,yarn,--conf,spark.yarn.submit.waitAppCompletion=true,\
--py-files,s3://bucketname/etl_module,\
--files,s3://bucketname/etl_module/aws_config.cfg,s3://bucketname/run_etl.py] \
--release-label emr-5.29.0 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large \
InstanceGroupType=CORE,InstanceCount=1,InstanceType=m4.large \
--auto-terminate --region us-east-1
What am I missing out?
I have zipped my etl python module and uploaded that along with the actual folder and configuration file aws_config.cfg. It works perfectly if submitted as a step function to existing cluster as I can see output being written to another S3 bucket. However, if I issue a CLI command to create a cluster and execute the step the step keeps executing forever.

Related

AWS EMR - Terminated with errors On the master instance application provisioning failed

I'm provisioning an EMR cluster emr-5.30.0. I run this using Terraform and get the following error on AWS CONSOLE as it fails.
Amazon EMR Cluster j-11I5FOBxxxxxx has terminated with errors at 2020-10-26 19:51 UTC with a reason of BOOTSTRAP_FAILURE.
I don't have any bootstrap steps. I can't view any logs either to see what is happening. Log URI is blank and can't SSH to cluster too since it's terminated.
Any pointers would be appreciated?
Providing AWS-CLI-EXPORT output:
aws emr create-cluster --auto-scaling-role EMR_AutoScaling_DefaultRole --applications Name=Spark --tags 'Account=xxx' 'Function=xxx' 'Repository=' 'Mail=xxx#xxx.com' 'Slack=xxx' 'Builder=xxx' 'Environment=xxx' 'Service=xxx xxx xxx' 'Team=xxx' 'Name=xxx-xxx-xxx' --ebs-root-volume-size 100 --ec2-attributes '{"KeyName":"xxx","AdditionalSlaveSecurityGroups":[""],"InstanceProfile":"EMR_EC2_DefaultRole","ServiceAccessSecurityGroup":"sg-xxx","SubnetId":"subnet-xxx","EmrManagedSlaveSecurityGroup":"sg-xxx","EmrManagedMasterSecurityGroup":"sg-xxx","AdditionalMasterSecurityGroups":[""]}' --service-role EMR_DefaultRole --release-label emr-5.30.0 --name 'xxx-xxx-xxx' --instance-groups '[{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":4}]},"InstanceGroupType":"MASTER","InstanceType":"m5.2xlarge","Name":""},{"InstanceCount":2,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":40,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"CORE","InstanceType":"m5.2xlarge","Name":""}]' --configurations '[{"Classification":"hadoop-env","Properties":{},"Configurations":[{"Classification":"export","Properties":{"PYSPARK_PYTHON":"/usr/bin/python3","JAVA_HOME":"/usr/lib/jvm/java-1.8.0"}}]},{"Classification":"spark-env","Properties":{},"Configurations":[{"Classification":"export","Properties":{"PYSPARK_PYTHON":"/usr/bin/python3","JAVA_HOME":"/usr/lib/jvm/java-1.8.0"}}]}]' --scale-down-behavior TERMINATE_AT_TASK_COMPLETION --region eu-west-2

Issue was due to JAVA_HOME incorrectly set.
JAVA_HOME":"/usr/lib/jvm/java-1.8.0"
Resolution: Check logs in S3 under: provision-node/reports and it should tell you which bootstrap step fails...

Try to change the instance type and try running it in different AZ and see if problem persists.

Building a cluster with emr-6.2.0 on md5.xlarge, this is JAVA_HOME:
/usr/lib/jvm/java-1.8.0-amazon-corretto.x86_64

Install Boto3 AWS EMR Failed attempting to download bootstrap action

I am trying to create my cluster using bootstrap actions option (which install boto3 on all nodes), but getting always Master instance failed attempting to download bootstrap action 1 file from S3
my bootstrapfile:
sudo pip install boto3
Command to create cluster :
aws emr create-cluster --applications Name=Hadoop Name=Hive Name=Hue Name=Mahout Name=Pig Name=Tez --ec2-attributes "{\"KeyName\":\"key-ec2\",\"InstanceProfile\":\"EMR_EC2_DefaultRole\",\"SubnetId\":\"subnet-49ad9733\",\"EmrManagedSlaveSecurityGroup\":\"sg-009d9df2b7b6b1302\",\"EmrManagedMasterSecurityGroup\":\"sg-0149cdd6586fe6db5\"}" --service-role EMR_DefaultRole --enable-debugging --release-label emr-5.30.1 --log-uri "s3n://aws-logs-447793603558-us-east-2/elasticmapreduce/" --name "MyCluster" --instance-groups "[{\"InstanceCount\":1,\"EbsConfiguration\":{\"EbsBlockDeviceConfigs\":[{\"VolumeSpecification\":{\"SizeInGB\":32,\"VolumeType\":\"gp2\"},\"VolumesPerInstance\":1}]},\"InstanceGroupType\":\"MASTER\",\"InstanceType\":\"m4.large\",\"Name\":\"Master Instance Group\"},{\"InstanceCount\":2,\"EbsConfiguration\":{\"EbsBlockDeviceConfigs\":[{\"VolumeSpecification\":{\"SizeInGB\":32,\"VolumeType\":\"gp2\"},\"VolumesPerInstance\":1}]},\"InstanceGroupType\":\"CORE\",\"InstanceType\":\"m4.large\",\"Name\":\"Core Instance Group\"}]" --scale-down-behavior TERMINATE_AT_TASK_COMPLETION --region us-east-2 --bootstrap-action Path=s3://calculsdistribues/bootstrap-emr.sh
I already created successfuly cluster without the bootstrap-action option.
What is the mistake here ? how my bootstrap file should looks like ?
Thank you

Make sure you have given read access to s3 bucket where your bootstrap script is present for the Instace profile "InstanceProfile\":\"EMR_EC2_DefaultRole

after looking in the logs, I found this error :
The bucket is in this region: eu-west-1. Please use this region to retry the request
The problem was that S3 bucket was created in a region and the cluster was created in another region.
I just created the cluster in the same region and it's worked.
thanks

aws cli automatically terminates my clusters

I'm running the following command using aws cli v2 on Windows 10
aws2 emr create-cluster --name "Spark cluster with step" \
--release-label emr-5.24.1 \
--applications Name=Spark \
--log-uri s3://boris-bucket/logs/ \
--ec2-attributes KeyName=boris-key \
--instance-type m5.xlarge \
--instance-count 1 \
--bootstrap-actions Path=s3://boris-bucket/bootstrap_file.sh \
--steps Type=Spark,Name="Spark job",ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn] \
--use-default-roles \
--no-auto-terminate
And the EC2 instance still terminates after launching and running for some time (about 5 minutes).
What am I missing? is there an option somewhere which supersedes my --no-auto-terminate?
EDIT: okay I figured it out, my bootstrap-actions were not valid. I found that out by looking at the log file for my node

I have no idea about emr, but when i checked the cli specification, it says --no-auto-terminate is a boolean. you should provide a value.
aws2 emr create-cluster --name "Spark cluster with step" \
...
--no-auto-terminate true
But the documentation also says, the auto termination is off by default. You should pass --termination-protected to the cli, then you can inspect the dashboard to see whats happening.
aws2 emr create-cluster --name "Spark cluster with step" \
...
--no-auto-terminate
--termination-protected
Because my guess is that the cluster is failing for some reason, even though you have set ActionOnFaulure = Continue, the cluster will terminate for failures when the Termination Protection is not enabled.
Reference:
https://docs.amazonaws.cn/en_us/emr/latest/ManagementGuide/UsingEMR_TerminationProtection.html
Hope this helps. Good Luck.

Passing hive configuration with aws emr cli

I am following doc: http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-dev-create-metastore-outside.html and trying to create emr cluster using the awscli==1.10.38 .
I use the following command as mentioned in the documentation:
aws emr create-cluster --release-label emr-5.0.0 --instance-type m3.xlarge --instance-count 2 \
--applications Name=Hive --configurations ./hiveConfiguration.json --use-default-roles
I am also using the exact same hiveConfiguration.json as mentioned in the document.
but it says "aws: error: invalid json argument for option --configurations"
Why do I get the error?

Your argument to --configurations is incorrect. Missing file:// CLI needs to know you are specifying a file or S3 object.
aws emr create-cluster --configurations file://hiveConfiguration.json

Bootstrap Failure when trying to install Spark on EMR

I am using this link to install Spark Cluster on EMR(Elastic Map Reduce on Amazon) https://aws.amazon.com/articles/Elastic-MapReduce/4926593393724923
For creating a Spark cluster I run the following command and my cluster is running into bootstrap failure every single time. I am not able to resolve this issue, and it will be great if any could help me here.
aws emr create-cluster --name SparkCluster --ami-version 3.2 \
--instance-type m3.xlarge --instance-count 3 --ec2-attributes \
KeyName=MYKEY --applications Name=Hive --bootstrap-actions \
Path=s3://support.elasticmapreduce/spark/install-spark
SOLVED : Use this:
aws emr create-cluster --name SparkCluster --ami-version 3.7 \
--instance-type m3.xlarge --instance-count 3 --service-role \
EMR_DefaultRole --ec2-attributes \
KeyName=emr,InstanceProfile=EMR_EC2_DefaultRole \
--applications Name=Hive --bootstrap-actions \
Path=s3://support.elasticmapreduce/spark/install-spark

Summary of the answer (it took a bit of back and forth in comments) that worked for this user given the user's SSH key and IAM roles:
aws emr create-cluster --name SparkCluster --ami-version 3.7 --instance-type m3.xlarge --instance-count 3 --service-role EMR_DefaultRole --ec2-attributes KeyName=emr,InstanceProfile=EMR_EC2_DefaultRole --applications Name=Hive --bootstrap-actions Path=s3://support.elasticmapreduce/spark/install-spark
Explanations of EMR IAM roles can be found at http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-iam-roles-creatingroles.html and http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-iam-roles-launch-jobflow.html

The 4th point under the section Spark with YARN on an Amazon EMR cluster at the link you provide says the following:
Substitute "MYKEY" value for the KeyName parameter with the name of the EC2 key pair you want to use to SSH into the master node of your EMR cluster.
As far as I can see, you have not changed the value of MYKEY for your own EC2 key name. You should try changing its value to an existing EC2 key name you have already created.
In case you still do not have a keypair, you can created following several methods, one of which is described in this link.
Update (from the comments below)
From your pictures, it seems there is a problem downloading the bootstrap action file from S3. I am not sure what the cause of the problem could be, but you might want to change the AMI and launch EMR with a different AMI version, 3.0, for example.

There is another way to directly start spark cluster in EMR.
Step 1 - Go to the EMR section in aws and click on create cluster.
Step 2 - Go to bootstrap actions in the configuration and add this line
s3://support.elasticmapreduce/spark/install-spark
https://www.pinterest.com/pin/429953095652701745/
Step 3 - Click on create cluster
Your cluster will start in minutes :)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

EMR auto terminate cluster not completing spark application? - amazon-web-services

Related

AWS EMR - Terminated with errors On the master instance application provisioning failed

Install Boto3 AWS EMR Failed attempting to download bootstrap action

aws cli automatically terminates my clusters

Passing hive configuration with aws emr cli

Bootstrap Failure when trying to install Spark on EMR

Categories

Resources