Unable to execute a step on a running EMR - amazon-web-services

I have an EMR cluster 5.28.1 running in AWS but I forgot to install from python libraries as part of the bootstrap action. Now that the cluster is running, I was simply attempting to add a step via the EMR console. Here are my settings
JAR: s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
Main class: None
Arguments: s3://xxxx/install_python_libraries.sh
Unfortunately, I get the following error.
Cannot run program "s3://xxxxx/install_python_libraries.sh" (in directory "."): error=2, No such file or directory
I am not sure what I am doing wrong. The shell script looks like this.
#!/bin/bash -xe
# Non-standard and non-Amazon Machine Image Python modules:
sudo pip-3.6 install boto3
sudo pip-3.6 install xmltodict
I also tried this by simply using 'command-runner.jar' but I get the same error. Can you please help me figure out the problem so I do this via the console? I would like to install the libraries on all nodes - master and core.
Thanks

The issue is the xxx.sh files EOL/carriage return type.
In other words, if it is Windows ("\r\n") then it will not work and return the ./ file not found error.
Convert it to unix type ("\n") using something like notepad++ and it will run fine.
(In notepad++ edit>EOL Conversion>Unix(LF) hit save and try again)

Related

Jenkins post build python script file

I am new to Jenkins, specially with using python script in Jenkins. The problem I am facing is as follow:
I am trying to run a python script from a python file in the post-build step of the Jenkins. I have added all the plugins required for that purpose to my understanding. i.e I have included Post-BuildScript plugin, python jenkins plugin etc.
Now when I build console output shows invalid script command caused the failure. I have attached the results below. can anybody help me with that please?
In post build step I am providing the full or absolute path to the python script file i.e
ExecutepythonScriptpath
Results
It may be useful to mention here I have also tried using just the path without writing python preceding the path, also tried with forward as well as backward slash in the path. without any success.
I have managed to resolve that issue. There are two parts of solution:
First one is if you want to run simple python script in post-build -->Add a post build step for Execute python Script (That will require you install plugin for post build ) . In that window created after adding post build step you can simply put any python command to run.
Second part of the solution is for, when user would like to run a list of commands from a python script file from the same post build step window in that case user has to make sure to put all the required python files which you want to execute into the Jenkins workspace->project directory(project for which we are running the Jenkins ) .
Moreover, for Python2.7 in order to execute that python script file user simply need to write script as
execfile(file.py)
One more thing to remember is insert python.exe path in the environment variables.

E: Malformed entry 7 in list file /etc/apt/sources.list.d/google-cloud-sdk.list (Suite) E: The list of sources could not be read

Getting this error when trying to sudo apt-get update. E: Malformed entry 7 in list file /etc/apt/sources.list.d/google-cloud-sdk.list (Suite)
E: The list of sources could not be read.
I tried to run sed to remove and no luck.
Please help.
Okay, after following the first 5 steps in the link: cloud.google.com/sdk/docs/quickstart-debian-ubuntu I received the following output:
Your Google Cloud SDK is configured and ready to use!
Commands that require authentication will use cloud#postaprayer.org by default
Commands will reference project post-a-prayer by default
Compute Engine commands will use region us-west2 by default
Compute Engine commands will use zone us-west2-a by default
Run gcloud help config to learn how to change individual settings
This gcloud configuration is called [postaprayerdns]. You can create additional configurations if you work with multiple accounts and/or projects.
Run gcloud topic configurations to learn more.
Some things to try next:
Run gcloud --help to see the Cloud Platform services you can interact with. And run gcloud help COMMAND to get help on any gcloud command.
Run gcloud topic --help to learn about advanced features of the SDK like arg files and output formatting
Okay. I was able to get into the google-cloud-sdk.list file and edit it using sudo nano /etc/apt/sources.list.d/google-cloud-sdk.list
From there I edited the .list file and deleted line 7 (which stated clear)
I edited these instructions to solve this error: https://askubuntu.com/questions/332669/unable-to-edit-etc-apt-sources-list-file
sudo nano /etc/apt/sources.list.d/google-cloud-sdk.list
`
Solved. Used Nano command to edit *.list file, deleted corrupt entry 7, and then saved.
Summary:
sudo nano /etc/apt/sources.list.d/google-cloud-sdk.list
From there I edited the .list file and deleted line 7 (which stated clear)
I edited these instructions to solve this error: https://askubuntu.com/questions/332669/unable-to-edit-etc-apt-sources-list-file
sudo nano /etc/apt/sources.list.d/google-cloud-sdk.list

spark cluster on aws emr cant find spark-env.sh

I am playing with apache-spark on aws emr, and trying to use this to set the cluster to use python3,
I use the command as the last command in a bootstrap script
sudo sed -i -e '$a\export PYSPARK_PYTHON=/usr/bin/python3' /etc/spark/conf/spark-env.sh
When I use it the cluster crashes during the bootstrap with the following error.
sed: can't read /etc/spark/conf/spark-env.sh: No such file or
directory
How should I set it to use python3 properly?
This is not a duplicate of, My issue is that the cluster is not finding the spark-env.sh file while bootstrapping, while the other question addresses the issue of the system not finding python3
In the end I did not use that script, but Used the EMR configuration file that is available on the creation stage, It gave me the proper configurations via spark_submit (in the aws gui) If you need it to be available for pyspark scripts in a more programatic way, you can use os.environ to set the pyspark python version in the python script

Google cloud compute startup script ignored with no logging

I have a standard Debian 8.9 instance on google cloud compute (GCE) where my startup script is ignored.
In the custom metadata field, for startup-script, I am trying to run an Rscript (which is used for batch execution of R files), followed by a system shutdown, with the following:
#! /bin/bash
sudo /usr/bin/Rscript /home/myuser/launch_script.R
sudo shutdown -h now
Starting the instance is immediately followed by a shutdown and the Rscript is ignored. Removing the last line to shutdown causes the GCE instance to start, but the Rscript to be ignored. Running just "sudo /usr/bin/Rscript /home/myuser/launch_script.R" from the terminal results in the script being run. It has a chmod of 755, so I don't think this is a permissions issue.
In addition to this problem, I have read elsewhere that logging should happen in /var/log/, but there is nothing there. Instead, I have a bunch of log files (that only contain the start-up script and nothing else) in the root of my instance:
I got in touch with Google cloud support, who gave the following response:
script definition is kept under /var/run/google.startup.script
If the script does not run initially, you can force it manually with : $ sudo google_metadata_script_runner --script-type startup # for Debian, or # sudo /usr/share/google/run-startup-scripts # on Ubuntu and older images
I'm posting this information here, because it is not in their documentation (as of August 2017). I'm not sure how helpful it is, since the google.startup.script didn't exist in my case (using the latest Debian image on GCE), but I did run the other commands.
However, I think my main issues were:
I was using autossh to connect to a remote database. The startup-script was running before autossh. Building a 40 second delay into the script and running the script as a user (not sudo-type root) seems to have solved this problem for now. Autossh was being run as the main user, which I think gets loaded before lower-privilege user-defined scripts get loaded.
I was using some gcloud commands from the user account which had its own authentication issues. Running gcloud auth login as the user and ensuring correct permissions on my private key solved this.
Always remember to check the messages and syslog files in /var/log for troubleshooting. This allowed me to see the order of things being loaded at system-boot.

How to set up and use EC2 CLI on Mac?

I am stuck at using Amazon EC2 CLI.
I have downloaded the Command Line Tools from
http://aws.amazon.com/developertools/351.
I placed the bin and lib folder into my Amazon project folder: /Users/Invictus/EC2
I downloaded the cert-xxxx.pem and pk-xxx.pem into the same folder.
Created a .bash_profile in the same folder.
I tried to execute ec2-describe-images -o amazon after I moved to cd /Users/Invictus/EC2.
The system does not recognise the command: command not found.
If I try to execute the same command inside the bin folder, the result is the same.
My .bash_profile:
export EC2_HOME=~/.EC2
export PATH=$PATH:$EC2_HOME/bin
export EC2_PRIVATE_KEY=`ls $EC2_HOME/pk-*.pem`
export EC2_CERT=`ls $EC2_HOME/cert-*.pem`
export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Home/
Where did I make a mistake?
My aim is to connect to the launched instance and be able to execute commands there from my local machine.
I have Java installed.
The newer AWS Unified CLI Tools is much, much easier to set up. All you need is Python, which comes built-in to every Mac.
Here are a few things I can think of:
Your .bash_profile should be in /Users/Invictus/ , not /Users/Invictus/EC2. Move it to your home directory and log off and log back in (or restart your machine) and see if it picks up the right path.
Instead of ec2-describe-images, can you run it as "./ec2-describe-images" - does that work? If not, can you check the permissions on that script?