Python BeautifulSoup UserWarning: No parser was explicitly specified [duplicate] - python-2.7

After I installed BeautifulSoup, whenever I run my Python in from the command line, this warning comes out:
D:\Application\python\lib\site-packages\beautifulsoup4-4.4.1-py3.4.egg\bs4\__init__.py:166:
UserWarning: No parser was explicitly specified, so I'm using the best
available HTML parser for this system ("html.parser"). This usually isn't a
problem, but if you run this code on another system, or in a different
virtual environment, it may use a different parser and behave differently.
To get rid of this warning, change this:
BeautifulSoup([your markup])
to this:
BeautifulSoup([your markup], "html.parser")
I have no idea why it comes out and how to solve it.

The solution to your problem is clearly stated in the error message. Code like the below does not specify an XML/HTML/etc. parser.
BeautifulSoup( ... )
In order to fix the error, you'll need to specify which parser you'd like to use, like so:
BeautifulSoup( ..., "html.parser" )
You can also install a 3rd party parser if you'd like.

Documentation recommends that you install and use lxml for speed.
BeautifulSoup(html, "lxml")
If you’re using a version of Python 2 earlier than 2.7.3, or a version
of Python 3 earlier than 3.2.2, it’s essential that you install lxml
or html5lib–Python’s built-in HTML parser is just not very good in
older versions.
Installing LXML parser
On Ubuntu (debian)
apt-get install python-lxml
Fedora (RHEL based)
dnf install python-lxml
Using PIP
pip install lxml

In my opinion, the previous posts did not answer the question.
Yes, as everyone said, you can remove the warning by specifying the parser.
And as pointed by the documentation, it is a best-practice for performances 1 and for consistency 2.
But in some cases, you want to silence the warning... Hence this post.
since BeautifulSoup 4 rev 460, the warning message does not appear in interactive (REPL) mode
there are more generalist answers at: How to disable Python warnings? to control Python warnings (TL;DL: PYTHONWARNINGS=ignore or -Wignore)
suppressing the warning explicitly (bs4 ≥ rev 569) by adding to your code:
import warnings
from bs4 import GuessedAtParserWarning
warnings.filterwarnings('ignore', category=GuessedAtParserWarning)
cheating by letting bs4 think you provided the parser, i.e.:
bs4.BeautifulSoup(
your_markup,
builder=bs4.builder_registry.lookup(*bs4.BeautifulSoup.DEFAULT_BUILDER_FEATURES)
)

For HTML parser, you need to install html5lib, run:
pip install html5lib
then add html5lib in the BeautifulSoup method:
htmlDoc = bs4.BeautifulSoup(req1.text, 'html5lib')
print(htmlDoc)

Related

python package urllib2 working but cannot find it

I have a piece of code that is working:
from urllib2 import urlopen
html = urlopen("http://jr.jd.com")
print(html.read())
html.close()
My problem is I don't find "urllib2" package on my mac.
The python version is 2.7 (Apple built-in).
I tried pip list to list installed packages, but I found "urllib3" but not urllib2.
(I don't think the "urllib3" package includes "urllib2" by default - they works very differently. To use "urllib3", the first line of my code would have to change to from urllib3.request import urlopen.)
I tried pip show urllib3, and since I can find this package, the output is as expected:
Name: urllib3
Location: /usr/local/lib/python2.7/site-packages
Then I tried pip show urllib2, and there is nothing at the output! - I think this means I don't have the package "urllib2" installed? (then why my code works?)
Could someone explain how my code works while I can't find the "urllib2" package? Many thanks!

ImportError: No module named vaderSentiment

I'm trying to run a code in python2.7 on windows os that uses sentiment analysis
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
and I'm getting this error
ImportError: No module named vaderSentiment
Can anyone help me with this?
Assuming you solved this one as it's from 7 months ago, but for anyone else searching for it:
Go into terminal/cmd and paste the following:
pip install vaderSentiment
More info on VADER: https://github.com/cjhutto/vaderSentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
#note: depending on how you installed (e.g., using source code download versus pip install), you may need to import like this:
#from vaderSentiment import SentimentIntensityAnalyzer
read the comment in a code
Try running your file with Python3 instead of just python. Sometimes when you have different pips/pythons installed on your computer you might have vaderSentiment installed in python2 when you need to run it in python3.

Importing Numpy in embedded Python c++ application

I would like to have a script invoke numpy from a c++ embedded python runtime by setting the runtime path to know about the numpy module located within site-packages.
However I get the error:
cannot import name 'multiarray'
from \Lib\site-packages\numpy\core__init_.py on the line
from . import multiarrray
I have tried to set the os.path to be xxx\numpy\core but it still cannot seem to find the multiarray.pyd file during the import statement
I have read through similar questions posed but none of the answers seem relevant to my case.
I am using Python 3.4.4 (32 bit) and have installed Numpy 1.11.1 using the wheel
numpy-1.11.1-cp34-none-win32.whl
python -m pip install numpy-1.11.1-cp34-none-win32.whl
Completed without any errors.
Seems like the failure message maybe more general than just an incomplete PYTHONPATH?
Also think it might be broader than Numpy in that ANY .pyd based package that is imported from the embedded environment will have this problem?
Any help appreciated.
Did you ensure all your NumPy includes: \numpy\core\include\numpy\ were present during the build? That's the only time I get those types of errors was if the build couldn't find all the NumPy includes... although during embedding I found that the numpy entire directory (already built on your build machine) has to be inside a directory under Py_SetPath(python35.lib;importlibs); assuming importlibs is a directory with NumPy inside and anything else you want to bundle.
Seems like the answer was to install python 3.4.1 to match the python34.dll version of 3.4.1.

mvpa2.suite: Runtime warning and erro in Python 2.7.6

I just installed mvpa2 module on my ubuntu 14.04, Python 2.7.6. following the instruction at http://www.pymvpa.org/installation.html using sudo aptitude install python-mvpa2
Command import mvpa2 works well, but when I run from mvpa2.suite import * , I get the followin warning in my terminal:
/usr/local/lib/python2.7/dist-packages/sklearn/pls.py:7: DeprecationWarning: This module has been moved to cross_decomposition and will be removed in 0.16
"removed in 0.16", DeprecationWarning)
And also fallowing error:
TypeError: __init__() got an unexpected keyword argument 'rho'
Appreciate your help!
actually this warning comes from the import done by mdp, which PyMVPA optionally uses... you can safely ignore it (no upgrade of PyMVPA would help anyhow), because even if it gets completely removed, then mdp would simply skip that import and you would remain 'golden'.
That problem is due to an incompatibility of the python-mvpa2 and scikit-learn versions. You can check more details on that in this page, because depends on which scikit-learn version you have what will be the parameters to call a given function.
A short solution is to uninstall your python-mvpa2 and scikit-learn, and install them directly from their github repos:
[python-mvpa2] https://github.com/PyMVPA
[scikit-learn] https://github.com/scikit-learn/scikit-learn
I just did it and now the example doc/examples/som.py (for my case) is working perfectly.

Pandas import error

I tried installing pandas using easy_install and it claimed that it successfully installed the pandas package in my Python Directory.
I switch to IDLE and try import pandas and it throws me the following error -
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
import pandas
File "C:\Python27\lib\site-packages\pandas-0.12.0-py2.7-win32.egg\pandas\__init__.py", line 6, in <module>
from . import hashtable, tslib, lib
File "numpy.pxd", line 157, in init pandas.hashtable (pandas\hashtable.c:20282)
ValueError: numpy.dtype has the wrong size, try recompiling
Please help me diagnose the error.
FYI: I have already installed the numpy package
Maybe you interrupted pandas install , retry using pip :
First install pip (if you haven't done it already) :
easy_install pip
then reinstall pandas:
pip install pandas --upgrade
Hope it helps
You know that output error you got when you tried running #nipun-batra's script?
Well, you got it because you have to first:
import platform
before you can run:
platform.platform()
I know this because I--about 10 minutes ago--got the same error when trying to run the same script. The difference is that I--an absolute beginner--figured out our problem after a quick trip over to google. (Man, they let you search for anything over there!)
This, when coupled with your follow-up appeal exactly two months after your initial posting, suggests to me that you would prefer to minimize--as much as possible--the usual hardship associated with owning and operating your own computer-machine-thingy.
As a result, with respect to your initial IDLE/pandas issue, your best best bet is to forget about messing around with easy_install, etc. Instead, go head on down to Continuum Analytics and pick up your very own (free) copy of Anaconda, which has got more packages than you can shake a stick at! (Including, I might add, pandas, numpy, scipy, statsmodels, matplotlib, IPython, and many more). And the best part is that it all comes bundled together as a single easy-to-download file. Trust me, it will save you a lot of headaches if you just download everything all at once.
Hope this helps!
Panda does not work with python 2.7 , do you will need python 3.6 or higer