Spacy is_stop function(bug?) - python-2.7

I am using the below code to check if a word is a stop word or not. As you can see below, if the try block fails, the IS_STOP function is throwing an error.
import spacy
nlp = spacy.load('en')
try:
print 0/0 #Raise and Exception
except:
print nlp.is_stop('is')`
I get the below error:
5 print 0/0
6 except:
----> 7 print spacy.load('en').is_stop('is')
AttributeError: 'English' object has no attribute 'is_stop' `

You need to process some text by 'calling' the nlp object as a function as explained here. You can then test for stop words on each token of the parsed sentence.
For example:
>>> import spacy
>>> nlp = spacy.load('en')
>>> sentence = nlp(u'this is a sample sentence')
>>> sentence[1].is_stop
True
In case you want to test for stop words directly from the English vocabulary, use the following:
>>> nlp.vocab[u'is'].is_stop
True

Related

print text file from web server to python program print errors

I'm trying to print a text a text file from a webserver in a python program but I am receiving errors. Any help would be greatly appreciated, here is my code:
import RPi.GPIO as GPIO
import urllib2
GPIO.setwarnings(False)
GPIO.setmode(GPIO.BOARD)
GPIO.setup(5,GPIO.OUT)
true = 1
while(true):
try:
response = urllib2.urlopen('http://148.251.158.132/k.txt')
status = response.read()
except urllib2.HTTPError, e:
print e.code
except urllib2.URLError, e:
print e.args
print status
if status=='bulbion':
GPIO.output(5,True)
elif status=='bulbioff':
GPIO.output(5,False)
By your comments, it appears your error: "SyntaxError: Missing parentheses in call to print", is caused by excluding parentheses/brackets in your print statements. People usually experience these errors after they update their python version, as the old print statements never required parentheses. The other error: "SyntaxError: unindent does not match any outer indentation level", is because your print statement on line 16 is one space behind all of your other statements on that indentation level, you can fix this problem by moving the print statement one space forward.
Changing your code to this should fix the problems:
import RPi.GPIO as GPIO
import urllib2
GPIO.setwarnings(False)
GPIO.setmode(GPIO.BOARD)
GPIO.setup(5,GPIO.OUT)
true = 1
while(true):
try:
response = urllib2.urlopen('http://148.251.158.132/k.txt')
status = response.read()
except urllib2.HTTPError, e:
print (e.code)
except urllib2.URLError, e:
print (e.args)
print (status)
if status=='bulbion':
GPIO.output(5,True)
elif status=='bulbioff':
GPIO.output(5,False)
Hope this helps!

How to convert CJK Extention B in QLineEdit of Python3-PyQt4 to utf-8 to Processing it with regex

I have a code like that:
#!/usr/bin/env python3
#-*-coding:utf-8-*-
from PyQt4 import QtGui, QtCore
import re
.....
str = self.lineEdit.text() # lineEdit is a object in QtGui.QLineEdit class
# This line thanks to Fedor Gogolev et al from
#https://stackoverflow.com/questions/12214801/print-a-string-as-hex-bytes
print('\\u'+"\\u".join("{:x}".format(ord(c)) for c in str))
# u+20000-u+2a6d6 is CJK Ext B
cjk = re.compile("^[一-鿌㐀-䶵\U00020000-\U0002A6D6]+$",re.UNICODE)
if cjk.match(str):
print("OK")
else:
print("error")
when I inputted "敏感詞" (0x654F,0x611F, 0x8A5E in utf16 respectively), the result was:
\u654f\u611f\u8a5e
OK
but when I input "詞𠀷𠂁𠁍" (0x8A5E, 0xD840 0xDC37, 0xD840 0xDC81, 0xD840 0xDC4D in utf-16) in which there were 3 characters from CJK Extention B Area. The result which is not expected is:
\u8a5e\ud840\udc37\ud840\udc81\ud840\udc4d
error
how can I processed these CJK characters with converting to utf-8 to be processed suitabliy with re of Python3?
P.S.
the value from sys.maxunicode is 1114111, it might be UCS-4. Hence, I think that the question seems not to be the same as
python regex fails to match a specific Unicode > 2 hex values
another code:
#!/usr/bin/env python3
#-*-coding:utf-8-*-
import re
CJKBlock = re.compile("^[一-鿌㐀-䶵\U00020000-\U0002A6D6]+$") #CJK ext B
print(CJKBlock.search('詞𠀷𠂁𠁍'))
returns <_sre.SRE_Match object; span=(0, 4), match='詞𠀷𠂁𠁍'> #expected result.
even I added self.lineEdit.setText("詞𠀷𠂁𠁍") inside __init__ function of the window class and executed it, the word in LineEdit shows appropriately, but when I pressed enter, the result was still "error"
version:
Python3.4.3
Qt version: 4.8.6
PyQt version: 4.10.4.
There were a few PyQt4 bugs following the implemetation of PEP-393 that can affect conversions between QString and python strings. If you use sip to switch to the v1 API, you should probably be able to confirm that the QString returned by the line-edit does not contain surrogate pairs. But if you then convert it to a python string, the surrogates should appear.
Here is how to test this in an interactive session:
>>> import sip
>>> sip.setapi('QString', 1)
>>> from PyQt4 import QtGui
>>> app = QtGui.QApplication([])
>>> w = QtGui.QLineEdit()
>>> w.setText('詞𠀷𠂁𠁍')
>>> qstr = w.text()
>>> qstr
PyQt4.QtCore.QString('詞𠀷𠂁𠁍')
>>> pystr = str(qstr)
>>> print('\\u' + '\\u'.join('{:x}'.format(ord(c)) for c in pystr))
\u8a5e\u20037\u20081\u2004d
Of course, this last line does not show surrogates for me, because I cannot do the test with PyQt-4.10.4. I have tested with PyQt-4.11.1 and PyQt-4.11.4, though, and I did not get see any problems. So you should try to upgrade to one of those.

Using python regex, find, and accessing groups

I'm using Python to (1) access an xml file, (2) search it for nodes containing regex1, (3) search the nodes found for regex2 (which has a couple capture groups), then (4) do things with the groups.
I've got steps 1 and 2 working. But I'm stuck on 3 and 4. Here's an example of my code:
from bs4 import BeautifulSoup
from urllib import urlopen
import re
from lxml import etree
url='https://www.gpo.gov/fdsys/bulkdata/BILLS/113/1/hr/BILLS-113hr2146ih.xml'
soup = BeautifulSoup(urlopen(url).read(), 'xml')
pattern = r'(am)(ed)'
regex = re.compile(pattern, re.IGNORECASE)
x = soup.find_all(text=re.compile("amended"))
count = 0
for each in x:
#I thought this would loop through x and search each result for
#the regex, then print the 2 groups like this: am--ed
print (regex.finditer(x[count]))
print (each.group(1), '--', each.group(2))
count = count + 1
But instead it prints this:
<callable-iterator object at 0x97efd0c>
Traceback (most recent call last):
File "/media/Windows/Documents and Settings/Andy/My Documents/Misc/Computer/Python/NLTK-Python Learning/test.py", line 17, in <module>
print (each.group(1), '--', each.group(2))
File "/usr/lib/python2.7/dist-packages/bs4/element.py", line 615, in __getattr__
self.__class__.__name__, attr))
AttributeError: 'NavigableString' object has no attribute 'group'
I've been playing with this for a week and have read everything relevant I can find online. But I'm obviously not understanding something. Any suggestions? - Thanks
Currently you aren't using your regex to search through each result of x. Try something like
for each in x:
for match in regex.finditer(each):
print (match.group(1), '--', match.group(2))

Returning error string from a method in python

I was reading a similar question Returning error string from a function in python. While I experimenting to create something similar in an Object Oriented programming so I could learn a few more things I got lost.
I am using Python 2.7 and I am a beginner on Object Oriented programming.
I can not figure out how to make it work.
Sample code checkArgumentInput.py:
#!/usr/bin/python
__author__ = 'author'
class Error(Exception):
"""Base class for exceptions in this module."""
pass
class ArgumentValidationError(Error):
pass
def __init__(self, arguments):
self.arguments = arguments
def print_method(self, input_arguments):
if len(input_arguments) != 3:
raise ArgumentValidationError("Error on argument input!")
else:
self.arguments = input_arguments
return self.arguments
And on the main.py script:
#!/usr/bin/python
import checkArgumentInput
__author__ = 'author'
argsValidation = checkArgumentInput.ArgumentValidationError(sys.argv)
if __name__ == '__main__':
try:
result = argsValidation.validate_argument_input(sys.argv)
print result
except checkArgumentInput.ArgumentValidationError as exception:
# handle exception here and get error message
print exception.message
When I am executing the main.py script it produces two blank lines. Even if I do not provide any arguments as input or even if I do provide argument(s) input.
So my question is how to make it work?
I know that there is a module that can do that work for me, by checking argument input argparse but I want to implement something that I could use in other cases also (try, except).
Thank you in advance for the time and effort reading and replying to my question.
OK. So, usually the function sys.argv[] is called with brackets in the end of it, and with a number between the brackets, like: sys.argv[1]. This function will read your command line input. Exp.: sys.argv[0] is the name of the file.
main.py 42
In this case main.py is sys.argv[0] and 42 is sys.argv[1].
You need to identifi the string you're gonna take from the command line.
I think that this is the problem.
For more info: https://docs.python.org/2/library/sys.html
I made some research and I found this useful question/ answer that helped me out to understand my error: Manually raising (throwing) an exception in Python
I am posting the correct functional code under, just in case that someone will benefit in future.
Sample code checkArgumentInput.py:
#!/usr/bin/python
__author__ = 'author'
class ArgumentLookupError(LookupError):
pass
def __init__(self, *args): # *args because I do not know the number of args (input from terminal)
self.output = None
self.argument_list = args
def validate_argument_input(self, argument_input_list):
if len(argument_input_list) != 3:
raise ValueError('Error on argument input!')
else:
self.output = "Success"
return self.output
The second part main.py:
#!/usr/bin/python
import sys
import checkArgumentInput
__author__ = 'author'
argsValidation = checkArgumentInput.ArgumentLookupError(sys.argv)
if __name__ == '__main__':
try:
result = argsValidation.validate_argument_input(sys.argv)
print result
except ValueError as exception:
# handle exception here and get error message
print exception.message
The following code prints: Error on argument input! as expected, because I violating the condition.
Any way thank you all for your time and effort, hope this answer will help someone else in future.

The BeautifulSoup object isn't fetching <ul> tags with class set to comments. Any suggestions?

>>> from bs4 import BeautifulSoup
>>> import urllib
>>> url = "http://www.securitytube.net/video/7313"
>>>
>>> page = urllib.urlopen(url)
>>>
>>> pageDom = BeautifulSoup(page)
On running the above code, I receive the dom object in the 'pageDom' variable. Now I do this (code mentioned below) and I get an empty list.
>>> allComments = pageDom.find_all("ul", class_="comments")
>>>
>>> allComments
[]
>>>
>>>
So now I removed 'class_' and am able to fetch all the unordered list tags.
Check the code below.
>>> allComments = pageDom.find_all("ul")
>>> len(allComments)
27
>>>
If I look at the source code of the page I can very well see all the < ul > with the class as "comments". I don't know where am I missing stuffs. I also tried changing the parser to "lxml" but no joy.
Any suggestions/ improvements will be highly appreciated ...
I am not sure if there is a difference from the versions but here is the code and the output that worked fine with Python 3.4:
url = "http://www.securitytube.net/video/7313"
page = urllib.request.urlopen(url)
pageDom = BeautifulSoup(page)
#print(pageDom)
#On running the above code, I receive the dom object in the 'pageDom' variable. Now I do this (code mentioned below) and I get an empty list.
allComments = pageDom.find_all("ul", class_="comments")
#print(allComments)
print(len(allComments))
#So now I removed 'class_' and am able to fetch all the unordered list tags. Check the code below.
allComments = pageDom.find_all("ul")
#print(allComments)
print(len(allComments))
Output:
C:\Python34\python.exe C:/{path}/testPython.py
2
27
Process finished with exit code 0
You can uncomment the print lines to see the array contents
I tested (multiple times) in python 2.7 32 bit-
from bs4 import BeautifulSoup
import urllib
url = "http://www.securitytube.net/video/7313"
page = urllib.urlopen(url)
page = d=page.read()
pageDom = BeautifulSoup(page,'lxml')
allComments = pageDom.find_all("ul", class_="comments")
print len(allComments)
allComments = pageDom.find_all("ul")
print len(allComments)
It prints-
2
27