AWS EMR script-runner access error - amazon-web-services

I'm running emr-5.12.0, with Amazon 2.8.3, Hive 2.3.2, Hue 4.1.0, Livy 0.4.0, Spark 2.2.1 and Zeppelin 0.7.3 on 1 m4.large as my master node and 1 m4.large as core node.
I am trying to execute a bootstrap action that configures some parts of the cluster. One of these includes the line:
sudo sed -i '/zeppelin.pyspark.python/c\ \"zepplin.pyspark.python\" : \"python3\",' /etc/alternatives/zeppelin-conf/interpreter.json
It makes sure that the Zeppelin uses python3.4 instead of python2.7. It works fine if I execute this in the terminal after SSH'ing to the master node, but it fails when I submit it as a Custom JAR step on the AWS Web interface. I get the following error:
ed: can't read /etc/alternatives/zeppelin-conf/interpreter.json
: No such file or directory
Command exiting with ret '2'
The same thing happens if I use
sudo sed -i '/zeppelin.pyspark.python/c\ \"zepplin.pyspark.python\" : \"python3\",' /etc/zeppelin-conf/interpreter.json
Obviously I could just change it from the Zeppelin UI, but I would like to include it in the bootstrap action.
Thanks!

It turns out that a bootstrap action submitted throug the AWS EMR web interface is submitted as a regular EMR step, so it's only run on the master node. This can be seen if you click the 'AWS CLI export' in the cluster web interface. The intended bootstrap action is listed as a regular step.
Using the command line to launch a cluster with a bootstrap action bypasses this problem, so I've just used that.
Edit: Looking back at the web interface, it's pretty clear that I was adding regular steps instead of bootstrap actions. My bad!

Related

How to make aws cli quiet

When I am using aws cli commands it adds debug data to its output.
Is there a way to make it quiet?
Here is my use case:
# get deployed version
COMMAND="git describe --tags"
aws ecs execute-command --cluster="${CLUSTER}" --task="${TASK}" --container="${SERVICE}" --command="${COMMAND}" --interactive > VERSION
The issue is that instead of expected contents of VERSION file (just the version number):
0.0.67
I have something like that:
The Session Manager plugin was installed successfully. Use the AWS CLI to start a session.
Starting session with SessionId: ecs-execute-command-123456789abcdefgh
0.0.67
Exiting session with sessionId: ecs-execute-command-123456789abcdefgh.
How can I get rid of the debug data?
I already tried adding --quiet parameter (parameter does not exist)
and redirecting error output, none helped.

What is the correct way of installing a JDBC driver on EMR for Sqoop to use?

I am running Sqoop 1.4.7 on AWS EMR 5.21.1 and am trying to import data from a database. I have successfully been able to do this manually where I create an EMR instance with Sqoop installed via the EMR Console.
Here are the preliminary steps that I performed in order to run sqoop on EMR
Download the JDBC Driver
Move the JDBC driver to the /usr/lib/sqoop/lib directory
I was able to successfully run a sqoop import when I was sshd into an EMR cluster with these commands:
wget -O mssql-jdbc.jar https://repo1.maven.org/maven2/com/microsoft/sqlserver/mssql-jdbc/8.4.0.jre8/mssql-jdbc-8.4.0.jre8.jar
sudo mv mssql-jdbc.jar /usr/lib/sqoop/lib/
When I try to run these commands from an EMR bootstrap script however I get the error:
usr/lib/sqoop/lib/ No such file or directory
After doing some investigation I realized this is because "Bootstrap actions execute before core services, such as Hadoop or Spark, are installed", as found here
So the /usr/lib/sqoop/lib directory doesnt exist when I run my bootstrap steps.
Here are some solutions which work but they feel like work-arounds
Create the /usr/lib/sqoop/lib directory in my bootstrap script and then place the jar in it
Add the jar to this directory as an EMR step. (Turns out this this is the correct approach, look at below accepted answer)
What is the correct way of installing this JDBC driver on EMR?
The 2nd option is the correct way to do it. The documentation explains running bash scripts as an EMR step.
You can also use the jar command-runner.jar and the arguments to be
bash -c "wget -O mssql-jdbc.jar https://repo1.maven.org/maven2/com/microsoft/sqlserver/mssql-jdbc/8.4.0.jre8/mssql-jdbc-8.4.0.jre8.jar;sudo mv mssql-jdbc.jar /usr/lib/sqoop/lib/"

spark cluster on aws emr cant find spark-env.sh

I am playing with apache-spark on aws emr, and trying to use this to set the cluster to use python3,
I use the command as the last command in a bootstrap script
sudo sed -i -e '$a\export PYSPARK_PYTHON=/usr/bin/python3' /etc/spark/conf/spark-env.sh
When I use it the cluster crashes during the bootstrap with the following error.
sed: can't read /etc/spark/conf/spark-env.sh: No such file or
directory
How should I set it to use python3 properly?
This is not a duplicate of, My issue is that the cluster is not finding the spark-env.sh file while bootstrapping, while the other question addresses the issue of the system not finding python3
In the end I did not use that script, but Used the EMR configuration file that is available on the creation stage, It gave me the proper configurations via spark_submit (in the aws gui) If you need it to be available for pyspark scripts in a more programatic way, you can use os.environ to set the pyspark python version in the python script

Running updates on EC2s that roll back on failure of status check

I’m setting up a patch process for EC2 servers running a web application.
I need to build an automated process that installs system updates but, reverts back to the last working ec2 instance if the web application fails a status check.
I’ve been trying to do this using an Automation Document in EC2 Systems Manager that performs the following steps:
Stop EC2 instance
Create AMI from instance
Launch new instance from newly created AMI
Run updates
Run status check on web application
If check fails, stop new instance and restart original instance
The Automation Document runs the first 5 steps successfully, but I can't identify how to trigger step 6? Can I do this within the Automation Document? What output would I be able to call from step 5? If it uses aws:runCommand, should the runCommand trigger a new automation document or another AWS tool?
I tried the following to solve this, which more or less worked:
Included an aws:runCommand action in the automation document
This ran the DocumentName "AWS-RunShellScript" with the following parameters:
Downloaded the script from s3:
sudo aws s3 cp s3://path/to/s3/script.sh /tmp/script.sh
Set the file to executable:
chmod +x /tmp/script.sh
Executed the script using variables set in, or generated by the automation document
bash /tmp/script.sh -o {{VAR1}} -n {{VAR2}} -i {{VAR3}} -l {{VAR4}} -w {{VAR5}}
The script included the following getopts command to set the inputted variables:
while getopts o:n:i:l:w: option
do
case "${option}"
in
n) VAR1=${OPTARG};;
o) VAR2=${OPTARG};;
i) VAR3=${OPTARG};;
l) VAR4=${OPTARG};;
w) VAR5=${OPTARG};;
esac
done
The bash script used the variables to run the status check, and roll back to last working instance if it failed.

AWS Elastic Beanstalk - Starting SWF Background Workers

I have been trying to find out the best way to run background jobs using PHP on AWS Elastic beanstalk, and after many hours searching on Google and SO, I believe that one good solution is using SWF and activity workers.
I found this example buried in the aws-sdk-for-php: https://github.com/amazonwebservices/aws-sdk-for-php/tree/master/_samples/AmazonSimpleWorkflow/cron
The read-me file says:
To run this sample, you need to execute three scripts from the command line in separate terminal/console windows
and
Note that the start_cron_example_workflow.php script will exit quickly
while the decider and activity worker scripts keep running until you
manually terminate them.
the decider and activity worker will loop "forever", and trying to run these in EB is what I'm having trouble doing.
In my .ebextensions directory I have a file that executes these files:
container_commands:
01background_task:
command: "php -f start_cron_example_activity_workers.php"
02background_task:
command: "php -f start_cron_example_workflow_workers.php"
But I get the following error messages:
ERROR
Failed to deploy application version.
ERROR
Some instances have not responded to commands. Responses were not received from [i-a5417ed4].
Any way I can do this using config files? How can I make this work in AWS EB without introducing a single point of failure?
Thank you.
You might consider using a service like IronWorker — this is specifically designed for what you are trying to do and will probably work better than putting together your own solution on a micro instance.
I have not used Iron.io yet, but was evaluating it as I am looking to move my stuff over to AWS so I need to have cron jobs handled as well.
Have you taken a look at the Fat Controller ? It can daemonise anything. There's documentation and examples on the website: http://fat-controller.sourceforge.net