I have the following piece of code copied from book programming collective intelligence page 118, chapter "Document Filtering". This function breaks up the text into words by dividing the text on any character that isn't a letter. This leaves only actual words,all converted to lower-case.
import re
import math
def getwords(doc):
splitter=re.compile('\\W*')
words=[s.lower() for s in splitter.split(doc)
if len(s)>2 and len(s)<20]
return dict([(w,1) for w in words])
I implemented the function and got the following error:
>>> import docclas
>>> t=docclass.getwords(s)
Traceback (most recent call last):
File "<pyshell#15>", line 1, in <module>
t=docclass.getwords(s)
File "docclass.py", line 6, in getwords
words=[s.lower() for s in splitter.split(doc)
NameError: global name 'splitter' is not defined
It works here
>>> import re
>>>
>>> def getwords(doc):
... splitter=re.compile('\\W*')
... words=[s.lower() for s in splitter.split(doc)
... if len(s)>2 and len(s)<20]
... return dict([(w,1) for w in words])
...
>>> getwords ("He's fallen in the water!");
{'water': 1, 'the': 1, 'fallen': 1}
I'm gueesing you made a typo in your code, but got it right when you pasted it here.
Related
This works:
ss = 'insert into images (file_path) values(?);'
dddd = (('dd1',), ('dd2',))
conn.executemany(ss, dddd)
However this does not:
s = 'insert into images (file_path) values (:v)'
ddddd = ({':v': 'dd11'}, {':v': 'dd22'})
conn.executemany(s, ddddd)
Traceback (most recent call last):
File "/Users/Wes/.virtualenvs/ppyy/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 3035, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-31-a999de59f73b>", line 1, in <module>
conn.executemany(s, ddddd)
ProgrammingError: You did not supply a value for binding 1.
I am wondering if it is possible to use named parameters with executemany and, if so, how.
The documentation at section 11.13.3 talks generally about parameters but doesn't discuss the two styles of parameters that are described for other flavors of .executexxx().
I have checked out Python sqlite3 execute with both named and qmark parameters which does not pertain to executemany.
The source shows that execute() simply constructs a one-element list and calls executemany(), so the problem is not with executemany() itself; the same call fails with execute():
>>> conn.execute('SELECT :v', {':v': 42})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
sqlite3.ProgrammingError: You did not supply a value for binding 1.
As shown in the Python documentation, named parameters do not include the colon:
# And this is the named style:
cur.execute("select * from people where name_last=:who and age=:age", {"who": who, "age": age})
So you have to use ddddd = ({'v': 'dd11'}, {'v': 'dd22'}).
The : isn't part of the parameter name.
>>> s = 'insert into images (file_path) values (:v)'
>>> ddddd = ({'v': 'dd11'}, {'v': 'dd22'})
>>> conn.executemany(s, ddddd)
<sqlite3.Cursor object at 0x0000000002C0E500>
>>> conn.execute('select * from images').fetchall()
[(u'dd11',), (u'dd22',)]
I am attempting to understand; and resolve, why the following happens:
$ python
>>> import struct
>>> list(struct.pack('hh', *(50,50)))
['2', '\x00', '2', '\x00']
>>> exit()
$ python3
>>> import struct
>>> list(struct.pack('hh', *(50, 50)))
[50, 0, 50, 0]
I understand that hh stands for 2 shorts. I understand that struct.pack is converting the two integers (shorts) to a c style struct. But why does the output in 2.7 differ so much from 3.5?
Unfortunately I am stuck with python 2.7 for right now on this project and I need the output to be similar to one from python 3.5
In response to comment from Some Programmer Dude
$ python
>>> import struct
>>> a = list(struct.pack('hh', *(50, 50)))
>>> [int(_) for _ in a]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: ''
in python 2, struct.pack('hh', *(50,50)) returns a str object.
This has changed in python 3, where it returns a bytes object (difference between binary and string is a very important difference between both versions, even if bytes exists in python 2, it is the same as str).
To emulate this behaviour in python 2, you could get ASCII code of the characters by appling ord to each char of the result:
map(ord,struct.pack('hh', *(50,50)))
I am using python 2.7.8 to write a small python code that reads a rule in a form A ==> B by using regex and return it in a form of 'A, B'.
This is my code:
import re
def fixp1(s):
pattern = re.compile("(?P<g1>([A-Z0-9a-z]|\?)*):(?P<g2>([A-Z0-9a-z]|\?)*)")
return eval(pattern.sub("('\g<g1>', '\g<g2>')", s))
x = "[ABCD:NP, [PQR:?TAG1]] ==> [XXX:?P]"
def readrule(r):
r.split("==>")
return [fixp1(r[0].strip()), fixp1(r[1].strip())]
When I test this code:
>>> readrule(x)
I got the following error message:
readrule(y)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../patterns.py", line 12, in readrule
return [fixp1(r[0].strip()), fixp1(r[1].strip())]
File ".../patterns.py", line 5, in fixp1
return eval(pattern.sub("('\g<g1>', '\g<g2>')", s))
File "<string>", line 1
[
^
SyntaxError: unexpected EOF while parsing
>>>
I think this problem happened because I couldn't add '[' and ']' in here
([A-Z0-9a-z]|\?)
If that's right, how to do it? if not; where is my mistake?
Remove the eval command, the RegEx.sub returns a string which is your match with the replacements applied, you cannot evaluate the string. This yields the SyntaxError you are seeing.
If you want to include [] in your patterns, you need to escape them with \:
pattern = re.compile(r'[\[\]0-9]+')
would match strings like '[1234]'.
I installed mailgun/talon on GCE and was trying out the example in the README section, but it threw the following error at me:
>>> from talon import signature
>>> message = """Thanks Sasha, I can't go any higher and is why I limited it to the
... homepage.
...
... John Doe
... via mobile"""
>>> message
"Thanks Sasha, I can't go any higher and is why I limited it to the\nhomepage.\n\nJohn Doe\nvia mobile"
>>> text,signtr = signature.extract(message, sender='john.doe#example.com')
ERROR:talon.signature.extraction:ERROR when extracting signature with classifiers
Traceback (most recent call last):
File "talon/signature/extraction.py", line 57, in extract
markers = _mark_lines(lines, sender)
File "talon/signature/extraction.py", line 99, in _mark_lines
elif is_signature_line(line, sender, EXTRACTOR):
File "talon/signature/extraction.py", line 40, in is_signature_line
return classifier.decisionFunc(data, 0) > 0
AttributeError: 'NoneType' object has no attribute 'decisionFunc'
Do I need to train the model somehow (this signature seems to be the ML example)? I installed it using pip.
If you want to use signature parsing with classifiers you just need to call talon.init() before using the lib - it loads trained classifiers. Other methods like talon.signature.bruteforce.extract_signature() or talon.quotations.extract_from() don't require classifiers. Here's a full code sample:
import talon
# don't forget to init the library first
# it loads machine learning classifiers
talon.init()
from talon import signature
message = """Thanks Sasha, I can't go any higher and is why I limited it to the
homepage.
John Doe
via mobile"""
text, signature = signature.extract(message, sender='john.doe#example.com')
# text == "Thanks Sasha, I can't go any higher and is why I limited it to the\nhomepage."
# signature == "John Doe\nvia mobile"
Today I was coding and i ran into this unusual error.
Here is my code:
from direct.showbase.ShowBase import ShowBase
import cogManager
class application(ShowBase):
def __init__(self):
ShowBase.__init__(self)
playApplication = application()
playApplication.run()
Error:
Traceback (most recent call last):
File "CogCreator.py", line 2, in <module>
import cogManager
File "C:\Users\GeekyGamerGavin\Documents\Toontown Phase Files\NEW\cogManager.p
y", line 4
^
IndentationError: expected an indented block
But the code works when I remove
import cogManager
Could I have some help? I'm confused!
EDIT: I dont have spaces / tabs mixed!
EDIT: Fixed it. Thanks!
You probably have a tab on an empty line, or are mixing tabs and spaces. Following PEP8 coding standards, indentation should be 4 spaces per level.
Try indenting def init(self):
from direct.showbase.ShowBase import ShowBase
import cogManager
class application(ShowBase):
def __init__(self):
ShowBase.__init__(self)
playApplication = application()
playApplication.run()