Can't get pip install to work on EMR cluster - amazon-web-services

I have an EMR (emr-5.30.0) cluster I'm trying to start with a bootstrap file in S3. The contents of the bootstrap file are:
#!/bin/bash
sudo pip3 install --user \
matplotlib \
pandas \
pyarrow \
pyspark
And the error in my stderr file is:
WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
Command "python setup.py egg_info" failed with error code 1 in /mnt/tmp/pip-build-br9bn1h3/pyspark/
Seems pretty simple...no idea what is going on. Any help is appreciated.
EDIT:
Tried #Dennis Traub suggestion and get same error. New EMR bootstrap looks like this:
#!/bin/bash
sudo pip3 install --upgrade setuptools
sudo pip3 install --user matplotlib pandas pyarrow pyspark

#!/bin/bash
sudo python3 -m pip install matplotlib pandas pyarrow
DO NOT install pyspark. It should be already there in EMR with required config. Installing may cause problems.

You might have an outdated version of setuptools. Try the following script:
#!/bin/bash
sudo pip3 install --upgrade setuptools
sudo pip3 install --user matplotlib pandas pyarrow pyspark

Related

EMR stuck at bootstrap script

Am trying to run a bootstrap file at EMR to installed facebook prophet which seems to have an issue requiring to install dev-tools, the bootstrap.sh simply runs
bootstrap.sh
#!/bin/bash -xe
sudo yum install python3-devel python3-libs python3-tools
#sudo yum groupinstall "Development Tools"
aws s3 cp s3://bucket/requirements.txt .
sudo python3 -m pip install --upgrade pip
sudo python3 -m pip install --upgrade -r ./requirements.txt
,output logs shows the below
and errors logs shows
but then the cluster is stuck for 1 hour before failing
I had to add -y flag to yum inorder to pypass any user prompting

Not able to install boto3

I have python 2.7 version installed.When i check for pip version it says below
pip --version
9.3
When i try to update pip using below command
python -m pip install --upgrade pip
It says pypi.python.org timed out.....Requirement already up to date
Now i want to install boto3 for that i am using below command
pip install boto3
But it says
"retry connection timeout ....
"Could not find a version that satisfies the requirement boto3 (from
versions: )"
Any idea why i am not able to install boto3?.Infact i am not able to install any other package like bs4 etc

How do I install boto3 without pip

My pip package is corrupted. I need to install boto3 using apt-get but I get the following error when I do apt-get install python-boto3:
E: Package 'python-boto3' has no installation candidate
Can someone please help me find a alternate way or some way to make apt-get work?
Thanks in advance!
You can't use apt-get to install python packages. In order to install a python package, pip resolves dependencies and build wheels. apt-get doesn't have the functionality to do the same.
You can purge the previous installation of pip:
sudo apt-get remove --purge python-pip
then install a new one:
curl https://bootstrap.pypa.io/get-pip.py | sudo python
finally, try installing boto3:
sudo pip install boto3

AWS EMR stuck bootstrapping

For a course, we are doing a basic tweet analysis using AWS EMR. I followed the steps in this document:
http://docs.aws.amazon.com/en_us/gettingstarted/latest/emr/awsgsg-emr.pdf
The only modifications are that I uploaded a pre-done set of tweets and we are told to use our own config file for the NLTK. The instructor gave us the following for the custom NLTK config:
#!/bin/bash
sudo yum -y install git gcc python-dev python-devel
sudo ln -sf /usr/bin/python2.7 /usr/bin/python
sudo easy_install pip
sudo pip install -U numpy
sudo pip install numpy
sudo easy_install -U distribute
sudo pip install -U setuptools
sudo pip install pyyaml nltk
sudo pip install -e git://github.com/mdp-toolkit/mdp-toolkit#egg=MDP
sudo python -m nltk.downloader -d /usr/share/nltk_data all
I create my cluster and, when it executes, it gets to 'bootstrapping' and has been stuck there for 45 minutes. Using AMI Version 3.11.0, no Hive, Pig, or HUE.
Please let me know if more information is needed to try to diagnose this. What could cause this?

Pandas Seaborn Install

On Ubuntu 12.04 LTS running Python 2.7 I'm getting an install error from attempting to add the great looking Seaborn plotting package to my existing Pandas environment which is running fine.
Here's a snippet from the console containing the errors:
~$ pip install seaborn
running install_lib
creating /usr/local/lib/python2.7/dist-packages/seaborn
error: could not create '/usr/local/lib/python2.7/dist-packages/seaborn':
Permission denied
Cleaning up...
Command /usr/bin/python -c "import setuptools, tokenize;__file__='/tm/pip_build_moj0/seaborn/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-LvVao5-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /tmp/pip_build_mojo/seaborn
Storing debug log for failure in /home/mojo/.pip/pip.log
Anyone have a resolution tip not available on the Seaborn github site?
I think the easiest way is to use sudo:
sudo pip install seaborn
It requires sudo permission to write to usr/local/lib.
Note: If you're using anaconda you won't need sudo to install via pip, once you've conda installed pip, though seaborn may also be available via conda.
Personal installation is a good habit to get into:
pip install --user seaborn
However, there is an even easier way: as of writing python XY maintains up-to-date builds of pandas and seaborn (among other useful packages), so all you have to do is
sudo add-apt-repository ppa:pythonxy/pythonxy-devel
sudo apt-get update
sudo apt-get install python-seaborn python-pandas
Note that this will only work with python 2.x; you will still need pip3 to install the python 3.x packages.