AWS EMR stuck bootstrapping - amazon-web-services

For a course, we are doing a basic tweet analysis using AWS EMR. I followed the steps in this document:
http://docs.aws.amazon.com/en_us/gettingstarted/latest/emr/awsgsg-emr.pdf
The only modifications are that I uploaded a pre-done set of tweets and we are told to use our own config file for the NLTK. The instructor gave us the following for the custom NLTK config:
#!/bin/bash
sudo yum -y install git gcc python-dev python-devel
sudo ln -sf /usr/bin/python2.7 /usr/bin/python
sudo easy_install pip
sudo pip install -U numpy
sudo pip install numpy
sudo easy_install -U distribute
sudo pip install -U setuptools
sudo pip install pyyaml nltk
sudo pip install -e git://github.com/mdp-toolkit/mdp-toolkit#egg=MDP
sudo python -m nltk.downloader -d /usr/share/nltk_data all
I create my cluster and, when it executes, it gets to 'bootstrapping' and has been stuck there for 45 minutes. Using AMI Version 3.11.0, no Hive, Pig, or HUE.
Please let me know if more information is needed to try to diagnose this. What could cause this?

Related

EMR stuck at bootstrap script

Am trying to run a bootstrap file at EMR to installed facebook prophet which seems to have an issue requiring to install dev-tools, the bootstrap.sh simply runs
bootstrap.sh
#!/bin/bash -xe
sudo yum install python3-devel python3-libs python3-tools
#sudo yum groupinstall "Development Tools"
aws s3 cp s3://bucket/requirements.txt .
sudo python3 -m pip install --upgrade pip
sudo python3 -m pip install --upgrade -r ./requirements.txt
,output logs shows the below
and errors logs shows
but then the cluster is stuck for 1 hour before failing
I had to add -y flag to yum inorder to pypass any user prompting

Can't get pip install to work on EMR cluster

I have an EMR (emr-5.30.0) cluster I'm trying to start with a bootstrap file in S3. The contents of the bootstrap file are:
#!/bin/bash
sudo pip3 install --user \
matplotlib \
pandas \
pyarrow \
pyspark
And the error in my stderr file is:
WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
Command "python setup.py egg_info" failed with error code 1 in /mnt/tmp/pip-build-br9bn1h3/pyspark/
Seems pretty simple...no idea what is going on. Any help is appreciated.
EDIT:
Tried #Dennis Traub suggestion and get same error. New EMR bootstrap looks like this:
#!/bin/bash
sudo pip3 install --upgrade setuptools
sudo pip3 install --user matplotlib pandas pyarrow pyspark
#!/bin/bash
sudo python3 -m pip install matplotlib pandas pyarrow
DO NOT install pyspark. It should be already there in EMR with required config. Installing may cause problems.
You might have an outdated version of setuptools. Try the following script:
#!/bin/bash
sudo pip3 install --upgrade setuptools
sudo pip3 install --user matplotlib pandas pyarrow pyspark

Error of "Command 'pip' not found" when trying to install requirements.txt

I'm trying to do: pip install -r requirements.txt on an AWS server. I recently pulled a git commit.
I get this error:
Command 'pip' not found, but can be installed with:
sudo apt install python-pip
So I tried entering:
sudo apt install python-pip install -r requirements.txt
and then
sudo apt install python-pip -r requirements.txt
But both attempts gave me this error:
E: Command line option 'r' [from -r] is not understood in combination with the other options.
What is the correct command to install this? Thank you.
You are mixing multiple commands.
apt ; It is Debian's package manager. It has nothing to do with python packages. You install pip through apt. There are also other ways of doing it.
pip : As understood it is python package manager. You can install dependencies for your project by listing them in requirements.txt.
The correct way would be :
sudo apt install python-pip
#install from a file requirements.txt:
sudo pip install -r requirements.txt
#install as a user :
pip install -U -r requirements.txt

How do I install boto3 without pip

My pip package is corrupted. I need to install boto3 using apt-get but I get the following error when I do apt-get install python-boto3:
E: Package 'python-boto3' has no installation candidate
Can someone please help me find a alternate way or some way to make apt-get work?
Thanks in advance!
You can't use apt-get to install python packages. In order to install a python package, pip resolves dependencies and build wheels. apt-get doesn't have the functionality to do the same.
You can purge the previous installation of pip:
sudo apt-get remove --purge python-pip
then install a new one:
curl https://bootstrap.pypa.io/get-pip.py | sudo python
finally, try installing boto3:
sudo pip install boto3

how to install scrapy on ubuntu?

I know that intall the scrapy should install the w3lib first,so I install the w3lib firstly,but when I import the scrapy in python ide,the program is crashed.
the error:
creating Twisted.egg-info
writing requirements to Twisted.egg-info\requires.txt
writing Twisted.egg-info\PKG-INFO
writing top-level names to Twisted.egg-info\top_level.txt
writing dependency_links to Twisted.egg-info\dependency_links.txt
writing manifest file 'Twisted.egg-info\SOURCES.txt'
warning: manifest_maker: standard file '-c' not found
reading manifest file 'Twisted.egg-info\SOURCES.txt'
writing manifest file 'Twisted.egg-info\SOURCES.txt'
copying twisted\internet\_sigchld.c -> build\lib.win-amd64-2.7\twisted\internet
creating build\lib.win-amd64-2.7\twisted\internet\iocpreactor\iocpsupport
copying twisted\internet/iocpreactor/iocpsupport\iocpsupport.c -> build\lib.win-amd64-2.7\twisted\internet/iocpreactor/i
ocpsupport
copying twisted\internet/iocpreactor/iocpsupport\winsock_pointers.c -> build\lib.win-amd64-2.7\twisted\internet/iocpreac
tor/iocpsupport
copying twisted\python\_epoll.c -> build\lib.win-amd64-2.7\twisted\python
copying twisted\python\_initgroups.c -> build\lib.win-amd64-2.7\twisted\python
copying twisted\python\sendmsg.c -> build\lib.win-amd64-2.7\twisted\python
copying twisted\runner\portmap.c -> build\lib.win-amd64-2.7\twisted\runner
copying twisted\test\raiser.c -> build\lib.win-amd64-2.7\twisted\test
running build_ext
What's wrong?
This is how I installed scrapy on ubuntu:
sudo apt-get update
sudo apt-get install python-pip build-essential python-dev libxslt-dev libxml2-dev
sudo -H pip install Scrapy
scrapy version
The important thing that solved my issues was sudo -H pip install Scrapy specifically the -H flag.
I also exited out of the terminal and started a new terminal to ensure the all the environment variables were set correctly
Make sure you had installed the Twisted, pyOpenSSL and pycrypto.
These are my steps to install scrapy on ubuntu.
1.install gcc and lxml:
sudo apt-get install python-dev
sudo apt-get install libevent-dev
sudo apt-get install libxml2 libxml2-dev
apt-get install libxml2-dev libxslt-dev
apt-get install python-lxml
2.install twisted:
sudo apt-get install python-twisted python-libxml2 python-simplejson
sudo apt-get install build-essential libssl-dev libffi-dev python-dev
3.install pyOpenSSL:
wget http://pypi.python.org/packages/source/p/pyOpenSSL/pyOpenSSL-0.13.tar.gz
tar -zxvf pyOpenSSL-0.13.tar.gz
cd pyOpenSSL-0.13
sudo python setup.py install
4.install pycrypto
wget http://pypi.python.org/packages/source/p/pycrypto/pycrypto-2.5.tar.gz
tar -zxvf pycrypto-2.5.tar.gz
cd pycrypto-2.5
sudo python setup.py install
5.install easy_install:(if you don't have easy_install)
wget http://peak.telecommunity.com/dist/ez_setup.py
python ez_setup.py
6.install w3lib
sudo easy_install -U w3lib
7.install scrapy
sudo easy_install Scrapy
If you wanna know much,please goto my blog.
First install system dependencies
sudo apt-get install -y \
python-dev python-pip python-setuptools \
libffi-dev libxml2-dev libxslt1-dev \
libtiff5-dev libjpeg62-turbo-dev zlib1g-dev libfreetype6-dev \
liblcms2-dev libwebp-dev tcl8.5-dev tk8.5-dev python-tk
Then add followings in your requirements.txt
lxml
pyOpenSSL
Scrapy
Pillow
And finally pip install -r requirements.txt
You can look around gist.github.com as well to resolve latest dependencies issues. I'm using docker to setup scrapy deps in a separate container.
I've created one for mine needs here