pycharm: How do I import pyspark to pycharm - python-2.7

I have done quite some spark job in Java/Scala, where I can run some test spark job directly from main() program, as long as I add the required spark jar in the maven pom.xml.
Now I am starting to work with pyspark. I am wondering if I could do something similar? For example, I am using pycharm to run a the wordCount job:
If I just run the main() program, I got the following error:
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/helpers/profiler/run_profiler.py", line 145, in <module>
profiler.run(file)
File "/Applications/PyCharm.app/Contents/helpers/profiler/run_profiler.py", line 84, in run
pydev_imports.execfile(file, globals, globals) # execute the script
File "/Users/edamame/PycharmProjects/myWordCount/myWordCount.py", line 6, in <module>
from pyspark import SparkContext
ImportError: No module named pyspark
Process finished with exit code 1
I am wondering how do I import pyspark here? so I could run some test job from the main() program like I did in Java/Scala.
I also tried to edit the interpreter path:
and my screenshot from Run -> Edit Configuration:
Last is my project structure screen shot:
Did I miss anything here? Thanks!

I finally got it work following the steps in this post. It is really helpful!
https://medium.com/data-science-cafe/pycharm-and-apache-spark-on-mac-os-x-990af6dc6f38#.jk5hl4kz0

I added the py4j-x.x.x-src.zip and pyspark.zip under $SPARK_HOME/python/lib to the project structure (preferences > Project> Project Structure and then do "+ Add Content Root") and it worked fine.
PS: Pycharm already had $PYTHONPATH and $SPARK_HOME read from the os env, which was set in .bashrc/.bash_profile

Related

Getting no module named _internals error while using Bloomberg API

I am currently using Python 2.7 and my OS is Windows 7. While attempting to use the Bloomberg API I am getting this error:
Traceback (most recent call last):
File "datagrab.py", line 1, in <module>
import blpapi, time, json
File "C:\Python27\lib\blpapi\__init__.py", line 5, in <module>
from .internals import CorrelationId
File "C:\Python27\lib\blpapi\internals.py", line 50, in <module>
_internals = swig_import_helper()
File "C:\Python27\lib\blpapi\internals.py", line 42, in swig_import_helper
import _internals
ImportError: No module named _internals
I have set my path variable to point to blpapi3_64.dll and also updated my bloomberg terminal. I have also moved the local blpapi API to a different directory but still the problem exists.
I am kind of new to this API in general. So can someone please guide me?
Thank you in advance!
From your question is sounds like maybe you have tried this, but just outlining one possible solution from the README in the Python Supported Release release available here.
Note that many Python installations add the current directory to the
module search path. If the Python interpreter is invoked from the
installer directory, such a configuration will attempt to use the
(incomplete) local blpapi directory as a module. If the above
import line fails with the message Import Error: No module named
_internals, move to a different directory before invoking python.
I know this question is a bit stale, but in case people end up here like me. Do you have the C++ version of blpapi? it is a requirement for the python api as mentioned here: https://www.bloomberg.com/professional/support/api-library/
so download the C++ zip installer, extract somewhere, and then add it as an environment variable so that the python api can find it:
Environment variable name: BLPAPI_ROOT
Value: C:\blp\blpapi_cpp_3.8.18.1 (THIS IS WHERE MINE IS INSTALLED, YOUR VALUE HERE MAY BE DIFFERENT)
Hope that helps!

ImportError when trying to execute command "from senti_classifier import senti_classifier"

I'm pretty sure that I've correctly installed the sentiment_classifier package on python. I have to use the package on one of my codes. The problem is that the import command is throwing an error. This is what shows up in my command prompt.
from senti_classifier import senti_classifier
Traceback (most recent call last):
File "", line 1, in
File "build\bdist.win32\egg\senti_classifier\senti_classifier.py", line 229, in
ImportError: No module named collections
However when i execute the command "import collections" no error is thrown. I've only recently started learning python and I'm relatively new to it. Any help is appreciated. Thanks!
PS the solution from Sentiment Analysis using senti_classifier and NLTK is not working for me.

Unable To Run IPython nbconvert From Python2.7 Virtual Environment

I have a virtual environment of Python 2.7 with ipython installed (Ubuntu 16.04.2 (Xenial) LTS.)
When I’m working in the virtual environment (after running source venv/bin/activate in bash shell while being in the parent directory of the virtual environment) I have no problem executing conversion of my jupiter’s notebook from bash shell like so:
ipython nbconvert --to html --execute my_notes.ipynb --stdout > /tmp/report.html
But when I’m trying to call that command from fabric’s task using subprocess:
command = ['ipython', 'nbconvert', '--to', 'html', '--execute', notebook_path, '--stdout']
output = subprocess.check_output(command,
cwd=os.environ['PYTHONPATH'],
env=os.environ.copy())
It always fails with this exception I cannot find a reason for it:
Traceback (most recent call last):
File "/opt/backend/venv/bin/ipython", line 7, in <module>
from IPython import start_ipython
File "/opt/backend/venv/local/lib/python2.7/site-packages/IPython/__init__.py", line 48, in <module>
from .core.application import Application
File "/opt/backend/venv/local/lib/python2.7/site-packages/IPython/core/application.py", line 25, in <module>
from IPython.core import release, crashhandler
File "/opt/backend/venv/local/lib/python2.7/site-packages/IPython/core/crashhandler.py", line 28, in <module>
from IPython.core import ultratb
File "/opt/backend/venv/local/lib/python2.7/site-packages/IPython/core/ultratb.py", line 119, in <module>
from IPython.core import debugger
File "/opt/backend/venv/local/lib/python2.7/site-packages/IPython/core/debugger.py", line 46, in <module>
from pdb import Pdb as OldPdb
File "/usr/lib/python2.7/pdb.py", line 59, in <module>
class Pdb(bdb.Bdb, cmd.Cmd):
AttributeError: 'module' object has no attribute 'Cmd'
More info to save your time.
I’ve tried:
Using same paths for PYTHONPATH as I got from PyCharm run/debug configuration.
Using nbconvert as python library from this documentation.
Tried os.system("ipython nbconvert…").
Wrapped working command (ipython nbconvert…) with a shell script and used it in subprocess.check_output and os.system.
Get drunk and bang my head on a brick wall.
And always end-up with that cursed exception.
Reposting as an answer for completeness:
There was a file called cmd.py somewhere where Python was finding it as an importable module. This was shadowing the cmd module in the standard library, which is used by pdb, which IPython imports. When pdb tried to subclass a class from cmd, that class wasn't there. Moving cmd.py out of the way lets it find the cmd module it needs.
This is an unfortunate annoyance with Python - lots of short words are already used as module names, and using them yourself produces crashes, with a wide range of different errors.

winexpect import in a .py not working when running through cmd

Why I am getting this error, when I try to run it through python corenlp.py?
Traceback (most recent call last):
File "corenlp.py", line 23, in (module)
from winpexpect import winspawn
File "C:\Python27\1ib\site-packages\winpexpect-1.5-py2.7.egg\winpexpect.py", 1ine 391, in (module)
class winspawn(spawn):
TypeError: Error when calling the metaclass bases
function() argument 1 must be code, not str
But, When i use it in PythonShell GUI, it is working perfectly.
from winpexpect import winspawn
child = winspawn('java -cp "C:\\Python27\\Scripts\\stanford-corenlp-full-2014-08-27\\*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref -ner.model edu/stanford/nlp/models/ner/english.all.3class.caseless.distsim.crf.ser.gz')
You are getting this error locally because your Python27 doesn't have a prerequisite component installed. In particular it needs Pywin32 installed. You need to download and install Pywin32 from here (specifically the 2.7 version in your case). PythonShell GUI must have this extension already installed so it works.
Finally I was able to resolve my problem(Thanks to #MichaelPetch). I tried this simple example test.py
from winpexpect import winspawn
child=winspawn("java")
It was working fine. Then I realized something wrong with my imports or dependencies. I was using wexpect.py which is another alternative for windows. I renamed it to pexpect.py and copied to my code base.
But winpexpect has dependencies for pexpect(it has another pexpect file on its own folder). When I try to run my .py through cmd it is referring the file in the same folder which is pexpect, I just renamed.
Solution: I just removed wexpect.

Django App Engine can't find antlr3 module

I'm trying to set up a Django app to run on GAE, and am using the on_production_server test to choose between dev vs. production settings in settings.py.
However, when I run
python manage.py runserver
I get:
Traceback (most recent call last):
File "manage.py", line 11, in <module>
import settings
File "/home/guillaume/myproject/settings.py", line 10, in <module>
from djangoappengine.utils import on_production_server, have_appserver
File "/home/guillaume/myproject/djangoappengine/utils.py", line 18, in <module>
'Error was: %s' % e)
Exception: Could not get appid. Is your app.yaml file missing?
Error was: No module named antlr3
I tried adding the following to settings.py:
import sys
sys.path.append('/usr/local/google_appengine/lib/')
And this line to the very end of .profile:
PATH="$PATH:/usr/local/google_appengine/"
But neither gets rid of the error. I'm really new to working with paths so I'm kind of fumbling around blindly here. Can anyone help?
Python2.5v or 2.7v?
And what about GAE SDK version?
Did you try this?
Saw this question while having the same problem. Solved it by installing antlr 3.1.1 python runtime from Here.