Install Boto3 AWS EMR Failed attempting to download bootstrap action - amazon-web-services

I am trying to create my cluster using bootstrap actions option (which install boto3 on all nodes), but getting always Master instance failed attempting to download bootstrap action 1 file from S3
my bootstrapfile:
sudo pip install boto3
Command to create cluster :
aws emr create-cluster --applications Name=Hadoop Name=Hive Name=Hue Name=Mahout Name=Pig Name=Tez --ec2-attributes "{\"KeyName\":\"key-ec2\",\"InstanceProfile\":\"EMR_EC2_DefaultRole\",\"SubnetId\":\"subnet-49ad9733\",\"EmrManagedSlaveSecurityGroup\":\"sg-009d9df2b7b6b1302\",\"EmrManagedMasterSecurityGroup\":\"sg-0149cdd6586fe6db5\"}" --service-role EMR_DefaultRole --enable-debugging --release-label emr-5.30.1 --log-uri "s3n://aws-logs-447793603558-us-east-2/elasticmapreduce/" --name "MyCluster" --instance-groups "[{\"InstanceCount\":1,\"EbsConfiguration\":{\"EbsBlockDeviceConfigs\":[{\"VolumeSpecification\":{\"SizeInGB\":32,\"VolumeType\":\"gp2\"},\"VolumesPerInstance\":1}]},\"InstanceGroupType\":\"MASTER\",\"InstanceType\":\"m4.large\",\"Name\":\"Master Instance Group\"},{\"InstanceCount\":2,\"EbsConfiguration\":{\"EbsBlockDeviceConfigs\":[{\"VolumeSpecification\":{\"SizeInGB\":32,\"VolumeType\":\"gp2\"},\"VolumesPerInstance\":1}]},\"InstanceGroupType\":\"CORE\",\"InstanceType\":\"m4.large\",\"Name\":\"Core Instance Group\"}]" --scale-down-behavior TERMINATE_AT_TASK_COMPLETION --region us-east-2 --bootstrap-action Path=s3://calculsdistribues/bootstrap-emr.sh
I already created successfuly cluster without the bootstrap-action option.
What is the mistake here ? how my bootstrap file should looks like ?
Thank you

Make sure you have given read access to s3 bucket where your bootstrap script is present for the Instace profile "InstanceProfile\":\"EMR_EC2_DefaultRole

after looking in the logs, I found this error :
The bucket is in this region: eu-west-1. Please use this region to retry the request
The problem was that S3 bucket was created in a region and the cluster was created in another region.
I just created the cluster in the same region and it's worked.
thanks

Related

AWS EMR - Terminated with errors On the master instance application provisioning failed

I'm provisioning an EMR cluster emr-5.30.0. I run this using Terraform and get the following error on AWS CONSOLE as it fails.
Amazon EMR Cluster j-11I5FOBxxxxxx has terminated with errors at 2020-10-26 19:51 UTC with a reason of BOOTSTRAP_FAILURE.
I don't have any bootstrap steps. I can't view any logs either to see what is happening. Log URI is blank and can't SSH to cluster too since it's terminated.
Any pointers would be appreciated?
Providing AWS-CLI-EXPORT output:
aws emr create-cluster --auto-scaling-role EMR_AutoScaling_DefaultRole --applications Name=Spark --tags 'Account=xxx' 'Function=xxx' 'Repository=' 'Mail=xxx#xxx.com' 'Slack=xxx' 'Builder=xxx' 'Environment=xxx' 'Service=xxx xxx xxx' 'Team=xxx' 'Name=xxx-xxx-xxx' --ebs-root-volume-size 100 --ec2-attributes '{"KeyName":"xxx","AdditionalSlaveSecurityGroups":[""],"InstanceProfile":"EMR_EC2_DefaultRole","ServiceAccessSecurityGroup":"sg-xxx","SubnetId":"subnet-xxx","EmrManagedSlaveSecurityGroup":"sg-xxx","EmrManagedMasterSecurityGroup":"sg-xxx","AdditionalMasterSecurityGroups":[""]}' --service-role EMR_DefaultRole --release-label emr-5.30.0 --name 'xxx-xxx-xxx' --instance-groups '[{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":4}]},"InstanceGroupType":"MASTER","InstanceType":"m5.2xlarge","Name":""},{"InstanceCount":2,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":40,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"CORE","InstanceType":"m5.2xlarge","Name":""}]' --configurations '[{"Classification":"hadoop-env","Properties":{},"Configurations":[{"Classification":"export","Properties":{"PYSPARK_PYTHON":"/usr/bin/python3","JAVA_HOME":"/usr/lib/jvm/java-1.8.0"}}]},{"Classification":"spark-env","Properties":{},"Configurations":[{"Classification":"export","Properties":{"PYSPARK_PYTHON":"/usr/bin/python3","JAVA_HOME":"/usr/lib/jvm/java-1.8.0"}}]}]' --scale-down-behavior TERMINATE_AT_TASK_COMPLETION --region eu-west-2
Issue was due to JAVA_HOME incorrectly set.
JAVA_HOME":"/usr/lib/jvm/java-1.8.0"
Resolution: Check logs in S3 under: provision-node/reports and it should tell you which bootstrap step fails...
Try to change the instance type and try running it in different AZ and see if problem persists.
Building a cluster with emr-6.2.0 on md5.xlarge, this is JAVA_HOME:
/usr/lib/jvm/java-1.8.0-amazon-corretto.x86_64

EMR auto terminate cluster not completing spark application?

I intend to create an auto terminating EMR cluster that executes a spark cluster and shuts down.
If I submit the application as a step to an existing cluster that does not auto terminate using the following command, it works and application completes in 3 minutes.
aws emr add-steps --cluster-id xxx \
--steps Name=imdbetlapp,Jar=command-runner.jar,Args=\
[spark-submit,--deploy-mode,cluster,--master,yarn,--conf,spark.yarn.submit.waitAppCompletion=true,\
--py-files,s3://bucketname/etl_module.zip,\
--files,s3://bucketname/etl_module/aws_config.cfg,\
s3://bucketname/run_etl.py],ActionOnFailure=CONTINUE --region us-east-1
However, when i use the following command to create an auto terminating cluster with a step function, the application keeps running for more that 30 minutes.
aws emr create-cluster --applications Name=Hadoop Name=Spark --use-default-roles \
--bootstrap-actions Path=s3://bucketname/emr_bootstrap.sh,Name=installPython \
--log-uri s3://logbucketname/elasticmapreduce/ \
--configurations https://s3.amazonaws.com/bucketname/emr_configurations.json \
--steps Name=imdbetlapp,Jar=command-runner.jar,Args=[spark-submit,--deploy-mode,cluster,\
--master,yarn,--conf,spark.yarn.submit.waitAppCompletion=true,\
--py-files,s3://bucketname/etl_module,\
--files,s3://bucketname/etl_module/aws_config.cfg,s3://bucketname/run_etl.py] \
--release-label emr-5.29.0 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large \
InstanceGroupType=CORE,InstanceCount=1,InstanceType=m4.large \
--auto-terminate --region us-east-1
What am I missing out?
I have zipped my etl python module and uploaded that along with the actual folder and configuration file aws_config.cfg. It works perfectly if submitted as a step function to existing cluster as I can see output being written to another S3 bucket. However, if I issue a CLI command to create a cluster and execute the step the step keeps executing forever.

Initialize AWS EC2 machine with access keys on launch

I want to launch an EC2 machine using aws cli. I want several things to take place before I connect, including setting my configuration.
I successfully launch the machine using:
aws ec2 run-instances --image-id ami-062f7200baf2fa504 --count 1 \
--instance-type t2.micro --key-name MyFirstKey --security-group-ids \
launch-wizard-3 --user-data file://aws_setup_script.txt
my aws_setup_script.txt is
sudo yum update -y
aws configure set aws_access_key_id AAAAABBBBBCCCCCDDDDD
aws configure set aws_secret_access_key AAAAABBBBBCCCCCDDDDDEEEEEFFFFFGGGGGHHHHH
aws configure set default.region us-east-1
sudo yum update -y successfully runs, but the aws configure steps do not.
It is insecure passing secrets in user-data.
Your script is failing because it isn't running as ec2-user so it doesn't have aws in the path. Even if it worked, it wouldn't be configuring the CLI tool for the ec2-user account so it isn't going to work the way you want.
Most importantly, there is a much better way to accomplish this. You should be assigning an IAM instance profile to the instance. When you run the aws cli tool on an instance with an IAM role assigned it will automatically use those credentials.
As per best practice, It's always better to use the IAM instance role attached to your Ec2 instead of setting the AWS credentials within Ec2.
Create an IAM instance role (refer AWS Doc) with the required permission want to give to Ec2.
Use --iam-instance-profile in aws cli command to attache the Ec2 with specific Iam role.
aws ec2 run-instances --image-id ami-062f7200baf2fa504 --count 1 \
--instance-type t2.micro --key-name MyFirstKey --security-group-ids \
launch-wizard-3 --iam-instance-profile

Passing hive configuration with aws emr cli

I am following doc: http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-dev-create-metastore-outside.html and trying to create emr cluster using the awscli==1.10.38 .
I use the following command as mentioned in the documentation:
aws emr create-cluster --release-label emr-5.0.0 --instance-type m3.xlarge --instance-count 2 \
--applications Name=Hive --configurations ./hiveConfiguration.json --use-default-roles
I am also using the exact same hiveConfiguration.json as mentioned in the document.
but it says "aws: error: invalid json argument for option --configurations"
Why do I get the error?
Your argument to --configurations is incorrect. Missing file:// CLI needs to know you are specifying a file or S3 object.
aws emr create-cluster --configurations file://hiveConfiguration.json

Bootstrap Failure when trying to install Spark on EMR

I am using this link to install Spark Cluster on EMR(Elastic Map Reduce on Amazon) https://aws.amazon.com/articles/Elastic-MapReduce/4926593393724923
For creating a Spark cluster I run the following command and my cluster is running into bootstrap failure every single time. I am not able to resolve this issue, and it will be great if any could help me here.
aws emr create-cluster --name SparkCluster --ami-version 3.2 \
--instance-type m3.xlarge --instance-count 3 --ec2-attributes \
KeyName=MYKEY --applications Name=Hive --bootstrap-actions \
Path=s3://support.elasticmapreduce/spark/install-spark
SOLVED : Use this:
aws emr create-cluster --name SparkCluster --ami-version 3.7 \
--instance-type m3.xlarge --instance-count 3 --service-role \
EMR_DefaultRole --ec2-attributes \
KeyName=emr,InstanceProfile=EMR_EC2_DefaultRole \
--applications Name=Hive --bootstrap-actions \
Path=s3://support.elasticmapreduce/spark/install-spark
Summary of the answer (it took a bit of back and forth in comments) that worked for this user given the user's SSH key and IAM roles:
aws emr create-cluster --name SparkCluster --ami-version 3.7 --instance-type m3.xlarge --instance-count 3 --service-role EMR_DefaultRole --ec2-attributes KeyName=emr,InstanceProfile=EMR_EC2_DefaultRole --applications Name=Hive --bootstrap-actions Path=s3://support.elasticmapreduce/spark/install-spark
Explanations of EMR IAM roles can be found at http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-iam-roles-creatingroles.html and http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-iam-roles-launch-jobflow.html
The 4th point under the section Spark with YARN on an Amazon EMR cluster at the link you provide says the following:
Substitute "MYKEY" value for the KeyName parameter with the name of the EC2 key pair you want to use to SSH into the master node of your EMR cluster.
As far as I can see, you have not changed the value of MYKEY for your own EC2 key name. You should try changing its value to an existing EC2 key name you have already created.
In case you still do not have a keypair, you can created following several methods, one of which is described in this link.
Update (from the comments below)
From your pictures, it seems there is a problem downloading the bootstrap action file from S3. I am not sure what the cause of the problem could be, but you might want to change the AMI and launch EMR with a different AMI version, 3.0, for example.
There is another way to directly start spark cluster in EMR.
Step 1 - Go to the EMR section in aws and click on create cluster.
Step 2 - Go to bootstrap actions in the configuration and add this line
s3://support.elasticmapreduce/spark/install-spark
https://www.pinterest.com/pin/429953095652701745/
Step 3 - Click on create cluster
Your cluster will start in minutes :)