aws cli automatically terminates my clusters - amazon-web-services

I'm running the following command using aws cli v2 on Windows 10
aws2 emr create-cluster --name "Spark cluster with step" \
--release-label emr-5.24.1 \
--applications Name=Spark \
--log-uri s3://boris-bucket/logs/ \
--ec2-attributes KeyName=boris-key \
--instance-type m5.xlarge \
--instance-count 1 \
--bootstrap-actions Path=s3://boris-bucket/bootstrap_file.sh \
--steps Type=Spark,Name="Spark job",ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn] \
--use-default-roles \
--no-auto-terminate
And the EC2 instance still terminates after launching and running for some time (about 5 minutes).
What am I missing? is there an option somewhere which supersedes my --no-auto-terminate?
EDIT: okay I figured it out, my bootstrap-actions were not valid. I found that out by looking at the log file for my node

I have no idea about emr, but when i checked the cli specification, it says --no-auto-terminate is a boolean. you should provide a value.
aws2 emr create-cluster --name "Spark cluster with step" \
...
--no-auto-terminate true
But the documentation also says, the auto termination is off by default. You should pass --termination-protected to the cli, then you can inspect the dashboard to see whats happening.
aws2 emr create-cluster --name "Spark cluster with step" \
...
--no-auto-terminate
--termination-protected
Because my guess is that the cluster is failing for some reason, even though you have set ActionOnFaulure = Continue, the cluster will terminate for failures when the Termination Protection is not enabled.
Reference:
https://docs.amazonaws.cn/en_us/emr/latest/ManagementGuide/UsingEMR_TerminationProtection.html
Hope this helps. Good Luck.

Related

Error creating an EKS Cluster, name and argument cannot be used at the same time

After trial and error I was able to finally get an EKS cluster up and running and deploy some software through a Helm Chart. After bringing the cluster down I tried to create a new cluster following the same exact process and am hitting an error upon executing the eksctl create command. Any idea what causes this error; I've been unable to find any detailed information regarding this error.
eksctl create cluster \
--name ${eksclustername} \
--version 1.21 \
--node-type r5d.4xlarge \
--nodes 5 \
--nodes-min 0 \
--nodes-max 5 \
--nodegroup-name ${eksnodegropname} \
--region ${eksregion}
Error: --name=My-Cluster and argument us-east-2 cannot be used at the same time

AWS CLI restore-from-cluster-snapshot doesn't find snapshot in account

I'm trying to restore a cluster from a snapshot using
aws redshift restore-from-cluster-snapshot --cluster-identifier my-cluster
--snapshot-identifier my-identifier --profile my-profile --region my-region
But I'm receiving
An error occurred (ClusterSnapshotNotFound) when calling
the RestoreFromClusterSnapshot operation: Snapshot not found: my-identifier
I checked the available snapshots using
aws redshift describe-cluster-snapshots --profile my-profile --region my-region
And my-identifier appears as available snapshot.
Entering via Redshift console I'm also able to see the snapshots and was able to restore it from the UI.
Does anybody have any clues ?
P.S.: Not sure if it's relevant, but it's a snapshot from another account that I shared with the account where I'm trying to restore the cluster
You must specify the owner account number when restoring to enable Redshift to decrypt the shared snapshot.
aws redshift restore-from-cluster-snapshot \
--profile myAwsCliProfile \
--snapshot-identifier mySnapshotName \
--owner-account 012345678910 \
--cluster-identifier my-new-redshift-cluster \
--number-of-nodes 6 \
--node-type ra3.16xlarge \
--port 5439 \
--region us-east-1 \
--availability-zone us-east-1d \
--cluster-subnet-group-name default\
--availability-zone-relocation \
--no-publicly-accessible \
--maintenance-track-name CURRENT

EMR auto terminate cluster not completing spark application?

I intend to create an auto terminating EMR cluster that executes a spark cluster and shuts down.
If I submit the application as a step to an existing cluster that does not auto terminate using the following command, it works and application completes in 3 minutes.
aws emr add-steps --cluster-id xxx \
--steps Name=imdbetlapp,Jar=command-runner.jar,Args=\
[spark-submit,--deploy-mode,cluster,--master,yarn,--conf,spark.yarn.submit.waitAppCompletion=true,\
--py-files,s3://bucketname/etl_module.zip,\
--files,s3://bucketname/etl_module/aws_config.cfg,\
s3://bucketname/run_etl.py],ActionOnFailure=CONTINUE --region us-east-1
However, when i use the following command to create an auto terminating cluster with a step function, the application keeps running for more that 30 minutes.
aws emr create-cluster --applications Name=Hadoop Name=Spark --use-default-roles \
--bootstrap-actions Path=s3://bucketname/emr_bootstrap.sh,Name=installPython \
--log-uri s3://logbucketname/elasticmapreduce/ \
--configurations https://s3.amazonaws.com/bucketname/emr_configurations.json \
--steps Name=imdbetlapp,Jar=command-runner.jar,Args=[spark-submit,--deploy-mode,cluster,\
--master,yarn,--conf,spark.yarn.submit.waitAppCompletion=true,\
--py-files,s3://bucketname/etl_module,\
--files,s3://bucketname/etl_module/aws_config.cfg,s3://bucketname/run_etl.py] \
--release-label emr-5.29.0 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large \
InstanceGroupType=CORE,InstanceCount=1,InstanceType=m4.large \
--auto-terminate --region us-east-1
What am I missing out?
I have zipped my etl python module and uploaded that along with the actual folder and configuration file aws_config.cfg. It works perfectly if submitted as a step function to existing cluster as I can see output being written to another S3 bucket. However, if I issue a CLI command to create a cluster and execute the step the step keeps executing forever.

How to keep Google Dataproc master running?

I created a cluster on Dataproc and it works great. However, after the cluster is idle for a while (~90 min), the master node will automatically stops. This happens to every cluster I created. I see there is a similar question here: Keep running Dataproc Master node
It looks like it's the initialization action problem. However the post does not give me enough info to fix the issue. Below are the commands I used to create the cluster:
gcloud dataproc clusters create $CLUSTER_NAME \
--project $PROJECT \
--bucket $BUCKET \
--region $REGION \
--zone $ZONE \
--master-machine-type $MASTER_MACHINE_TYPE \
--master-boot-disk-size $MASTER_DISK_SIZE \
--worker-boot-disk-size $WORKER_DISK_SIZE \
--num-workers=$NUM_WORKERS \
--initialization-actions gs://dataproc-initialization-actions/connectors/connectors.sh,gs://dataproc-initialization-actions/datalab/datalab.sh \
--metadata gcs-connector-version=$GCS_CONNECTOR_VERSION \
--metadata bigquery-connector-version=$BQ_CONNECTOR_VERSION \
--scopes cloud-platform \
--metadata JUPYTER_CONDA_PACKAGES=numpy:scipy:pandas:scikit-learn \
--optional-components=ANACONDA,JUPYTER \
--image-version=1.3
I need the BigQuery connector, GCS connector, Jupyter and DataLab for my cluster.
How can I keep my master node running? Thank you.
As summarized in the comment thread, this is indeed caused by Datalab's auto-shutdown feature. There are a couple ways to change this behavior:
Upon first creating the Datalab-enabled Dataproc cluster, log in to Datalab and click on the "Idle timeout in about ..." text to disable it: https://cloud.google.com/datalab/docs/concepts/auto-shutdown#disabling_the_auto_shutdown_timer - The text will change to "Idle timeout is disabled"
Edit the initialization action to set the environment variable as suggested by yelsayed:
function run_datalab(){
if docker run -d --restart always --net=host -e "DATALAB_DISABLE_IDLE_TIMEOUT_PROCESS=true" \
-v "${DATALAB_DIR}:/content/datalab" ${VOLUME_FLAGS} datalab-pyspark; then
echo 'Cloud Datalab Jupyter server successfully deployed.'
else
err 'Failed to run Cloud Datalab'
fi
}
And use your custom initialization action instead of the stock gs://dataproc-initialization-actions one. It could be worth filing a tracking issue in the github repo for dataproc initialization actions too, suggesting to disable the timeout by default or provide an easy metadata-based option. It's probably true that the auto-shutdown behavior isn't as expected in default usage on a Dataproc cluster since the master is also performing roles other than running the Datalab service.

Bootstrap Failure when trying to install Spark on EMR

I am using this link to install Spark Cluster on EMR(Elastic Map Reduce on Amazon) https://aws.amazon.com/articles/Elastic-MapReduce/4926593393724923
For creating a Spark cluster I run the following command and my cluster is running into bootstrap failure every single time. I am not able to resolve this issue, and it will be great if any could help me here.
aws emr create-cluster --name SparkCluster --ami-version 3.2 \
--instance-type m3.xlarge --instance-count 3 --ec2-attributes \
KeyName=MYKEY --applications Name=Hive --bootstrap-actions \
Path=s3://support.elasticmapreduce/spark/install-spark
SOLVED : Use this:
aws emr create-cluster --name SparkCluster --ami-version 3.7 \
--instance-type m3.xlarge --instance-count 3 --service-role \
EMR_DefaultRole --ec2-attributes \
KeyName=emr,InstanceProfile=EMR_EC2_DefaultRole \
--applications Name=Hive --bootstrap-actions \
Path=s3://support.elasticmapreduce/spark/install-spark
Summary of the answer (it took a bit of back and forth in comments) that worked for this user given the user's SSH key and IAM roles:
aws emr create-cluster --name SparkCluster --ami-version 3.7 --instance-type m3.xlarge --instance-count 3 --service-role EMR_DefaultRole --ec2-attributes KeyName=emr,InstanceProfile=EMR_EC2_DefaultRole --applications Name=Hive --bootstrap-actions Path=s3://support.elasticmapreduce/spark/install-spark
Explanations of EMR IAM roles can be found at http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-iam-roles-creatingroles.html and http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-iam-roles-launch-jobflow.html
The 4th point under the section Spark with YARN on an Amazon EMR cluster at the link you provide says the following:
Substitute "MYKEY" value for the KeyName parameter with the name of the EC2 key pair you want to use to SSH into the master node of your EMR cluster.
As far as I can see, you have not changed the value of MYKEY for your own EC2 key name. You should try changing its value to an existing EC2 key name you have already created.
In case you still do not have a keypair, you can created following several methods, one of which is described in this link.
Update (from the comments below)
From your pictures, it seems there is a problem downloading the bootstrap action file from S3. I am not sure what the cause of the problem could be, but you might want to change the AMI and launch EMR with a different AMI version, 3.0, for example.
There is another way to directly start spark cluster in EMR.
Step 1 - Go to the EMR section in aws and click on create cluster.
Step 2 - Go to bootstrap actions in the configuration and add this line
s3://support.elasticmapreduce/spark/install-spark
https://www.pinterest.com/pin/429953095652701745/
Step 3 - Click on create cluster
Your cluster will start in minutes :)