Mailgun Talon: Signature extraction example throwing error - mailgun

I installed mailgun/talon on GCE and was trying out the example in the README section, but it threw the following error at me:
>>> from talon import signature
>>> message = """Thanks Sasha, I can't go any higher and is why I limited it to the
... homepage.
...
... John Doe
... via mobile"""
>>> message
"Thanks Sasha, I can't go any higher and is why I limited it to the\nhomepage.\n\nJohn Doe\nvia mobile"
>>> text,signtr = signature.extract(message, sender='john.doe#example.com')
ERROR:talon.signature.extraction:ERROR when extracting signature with classifiers
Traceback (most recent call last):
File "talon/signature/extraction.py", line 57, in extract
markers = _mark_lines(lines, sender)
File "talon/signature/extraction.py", line 99, in _mark_lines
elif is_signature_line(line, sender, EXTRACTOR):
File "talon/signature/extraction.py", line 40, in is_signature_line
return classifier.decisionFunc(data, 0) > 0
AttributeError: 'NoneType' object has no attribute 'decisionFunc'
Do I need to train the model somehow (this signature seems to be the ML example)? I installed it using pip.

If you want to use signature parsing with classifiers you just need to call talon.init() before using the lib - it loads trained classifiers. Other methods like talon.signature.bruteforce.extract_signature() or talon.quotations.extract_from() don't require classifiers. Here's a full code sample:
import talon
# don't forget to init the library first
# it loads machine learning classifiers
talon.init()
from talon import signature
message = """Thanks Sasha, I can't go any higher and is why I limited it to the
homepage.
John Doe
via mobile"""
text, signature = signature.extract(message, sender='john.doe#example.com')
# text == "Thanks Sasha, I can't go any higher and is why I limited it to the\nhomepage."
# signature == "John Doe\nvia mobile"

Related

Google Vision API 'TypeError: invalid file'

The following piece of code comes from Google's Vision API Documentation, the only modification I've made is adding the argument parser for the function at the bottom.
import argparse
import os
from google.cloud import vision
import io
def detect_text(path):
"""Detects text in the file."""
client = vision.ImageAnnotatorClient()
with io.open(path, 'rb') as image_file:
content = image_file.read()
image = vision.types.Image(content=content)
response = client.text_detection(image=image)
texts = response.text_annotations
print('Texts:')
for text in texts:
print('\n"{}"'.format(text.description))
vertices = (['({},{})'.format(vertex.x, vertex.y)
for vertex in text.bounding_poly.vertices])
print('bounds: {}'.format(','.join(vertices)))
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", type=str,
help="path to input image")
args = vars(ap.parse_args())
detect_text(args)
If I run it from a terminal like below, I get this invalid file error:
PS C:\VisionTest> python visionTest.py --image C:\VisionTest\test.png
Traceback (most recent call last):
File "visionTest.py", line 31, in <module>
detect_text(args)
File "visionTest.py", line 10, in detect_text
with io.open(path, 'rb') as image_file:
TypeError: invalid file: {'image': 'C:\\VisionTest\\test.png'}
I've tried with various images and image types as well as running the code from different locations with no success.
Seems like either the file doesn't exist or is corrupt since it isn't even read. Can you try another image and validate it is in the location you expect?

OSMnx gives "TypeError: query must be a string or a list of query strings" on basic use

After following the instructions for installation of OSMnx (including explicitly installing spatialindex) with
brew install spatialindex
pip install osmnx
running the very first basic example of
import osmnx as ox
G = ox.graph_from_place('Manhattan Island, New York City, New York, USA', network_type='drive')
ox.plot_graph(ox.project_graph(G))
in the project's readme, I get
Traceback (most recent call last):
File "/Users/Rax/Documents/Projects/Coding/Python/maps/test.py", line 23, in <module>
G = ox.graph_from_place('Manhattan Island, New York City, New York, USA', network_type='drive')
File "/usr/local/lib/python2.7/site-packages/osmnx/core.py", line 1850, in graph_from_place
raise TypeError('query must be a string or a list of query strings')
TypeError: query must be a string or a list of query strings
How do I get OSMnx to run past this error?
This can be caused by having
from __future__ import unicode_literals
in your code, since including it turns all strings to type unicode, while the API expects arguments of type string. If that is present, removing it will prevent the error from occurring.
See also: https://github.com/gboeing/osmnx/issues/185
OSMnx is compatible with Python 2 and 3, so you don't need to import from the future package to use it. If you use Python 2 and import unicode_literals from future, all of your strings will be of type unicode instead. As you can see in the documentation, graph_from_place expects the query to be of type string, not of type unicode.

Error translating Tornado template with gettext

I've got this site running on top with Tornado and its template engine that I want to Internationalize, so I thought on using gettext to help me with that.
Since my site is already in Portuguese, my message.po (template) file has all msgid's in portuguese as well (example):
#: base.html:30 base.html:51
msgid "Início"
msgstr ""
It was generated with xgettext:
xgettext -i *.html -L Python --from-code UTF-8
Later I used Poedit to generate the translation file en_US.po and later compile it as en_US.mo.
Stored in my translation folder:
translation/en_US/LC_MESSAGES/site.mo
So far, so good.
I've created a really simple RequestHandler that would render and return the translated site.
import os
import logging
from tornado.web import RequestHandler
import tornado.locale as locale
LOG = logging.getLogger(__name__)
class SiteHandler(RequestHandler):
def initialize(self):
locale.load_gettext_translations(os.path.join(os.path.dirname(__file__), '../translations'), "site")
def get(self, page):
LOG.debug("PAGE REQUESTED: %s", page)
self.render("site/%s.html" %page)
As far as I know that should work perfectly, but somehow I've encountered some issues:
1 - How do I tell Tornado that my template has its text in Portuguese so it won't go looking for a pt locale which I don't have?
2 - When asking for the site with en_US locale, it loads ok but when Tornado is going to translate, it throws an encoding exception.
TypeError: not all arguments converted during string formatting
ERROR:views.site:Could not load template
Traceback (most recent call last):
File "/Users/ademarizu/Dev/git/new_plugin/site/src/main/py/views/site.py", line 20, in get
self.render("site/%s.html" %page)
File "/Users/ademarizu/Dev/virtualEnvs/execute/lib/python2.7/site-packages/tornado/web.py", line 664, in render
html = self.render_string(template_name, **kwargs)
File "/Users/ademarizu/Dev/virtualEnvs/execute/lib/python2.7/site-packages/tornado/web.py", line 771, in render_string
return t.generate(**namespace)
File "/Users/ademarizu/Dev/virtualEnvs/execute/lib/python2.7/site-packages/tornado/template.py", line 278, in generate
return execute()
File "site/home_html.generated.py", line 11, in _tt_execute
_tt_tmp = _("Início") # site/base.html:30
File "/Users/ademarizu/Dev/virtualEnvs/execute/lib/python2.7/site-packages/tornado/locale.py", line 446, in translate
return self.gettext(message)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gettext.py", line 406, in ugettext
return self._fallback.ugettext(message)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gettext.py", line 407, in ugettext
return unicode(message)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)
Any help?
Ah, I'm running python 2.7 btw!
1 - How do I tell Tornado that my template has its text in Portuguese so it won't go looking for a pt locale which I don't have?
This is what the set_default_locale method is for. Call tornado.locale.set_default_locale('pt') (or pt_BR, etc) once at startup to tell tornado that your template source is in Portuguese.
2 - When asking for the site with en_US locale, it loads ok but when Tornado is going to translate, it throws an encoding exception.
Remember that in Python 2, strings containing non-ascii characters need to be marked as unicode. Instead of _("Início"), use _(u"Início").

name from Orange is not defined

I've setup Orange and tried to execute this code in PythonWin
And got error on 2nd line
Was my setup of Orange incomplete or it's something else?
>>> from Orange.data import *
>>> color = DiscreteVariable("color", values=["orange", "green", "yellow"])
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
NameError: name 'DiscreteVariable' is not defined
I'm not sure what the guy in the blog post is doing, or maybe there are some other steps that he explained in previous blog posts, but this code 'as is' is not going to work.
I searched the source code for Orange, and DiscreteVariable isn't mentioned anywhere, not as class, not as regular word, nothing.
What I did find however is
Discrete = core.EnumVariable
in Orange/feature/__init__.py. As you can see this points to core.EnumVariable, which appears, looking at it's usage:
orange.EnumVariable('color', values = ["green", "red"])\
to be the same as DiscreteVariable in your link.
So I suggest you use from Orange.feature import Discrete instead and use that.

NLTK getting dependencies from raw text

I need get dependencies in sentences from raw text using NLTK.
As far as I understood, stanford parser allows us just to create tree, but how to get dependencies in sentences from this tree I didn't find out (maybe it's possible, maybe not)
So I've started using MaltParser. Here is a peace code I'm using:
import os
from nltk.parse.stanford import StanfordParser
from nltk.tokenize import sent_tokenize
from nltk.parse.dependencygraph import DependencyGraph
from nltk.parse.malt import MaltParser
os.environ['JAVAHOME'] = r"C:\Program Files (x86)\Java\jre1.8.0_45\bin\java.exe"
os.environ['MALT_PARSER'] = r"C:\maltparser-1.8.1"
maltParser = MaltParser(r"C:\maltparser-1.8.1\engmalt.poly-1.7.mco")
class Parser(object):
#staticmethod
def Parse (text):
rawSentences = sent_tokenize(text)
treeSentencesStanford = stanfordParser.raw_parse_sents(rawSentences)
a=maltParser.raw_parse(rawSentences[0])
but last line throws exception "'str' object has no attribute 'tag'"
changing the code above like this:
rawSentences = sent_tokenize(text)
treeSentencesStanford = stanfordParser.raw_parse_sents(rawSentences)
splitedSentences = []
for sentence in rawSentences:
splitedSentence = word_tokenize(sentence)
splitedSentences.append(splitedSentence)
a=maltParser.parse_sents(splitedSentences)
throws the same exception.
So, what I'm I doing wrong.
And in general: I'm I going in right way to get dependencies like this: http://www.nltk.org/images/depgraph0.png (but I need access these dependencies from code)
Traceback (most recent call last):
File "E:\Google drive\Python multi tries\Python multi tries\Parser.py", line 51, in <module>
Parser.Parse("Some random sentence. Hopefully it will be parsed.")
File "E:\Google drive\Python multi tries\Python multi tries\Parser.py", line 32, in Parse
a=maltParser.parse_sents(splitedSentences)
File "C:\Python27\lib\site-packages\nltk-3.0.1-py2.7.egg\nltk\parse\malt.py", line 113, in parse_sents
tagged_sentences = [self.tagger.tag(sentence) for sentence in sentences]
AttributeError: 'str' object has no attribute 'tag'
You are instantiating MaltParser with an unsuitable argument.
Running help(MaltParser) gives the following information:
Help on class MaltParser in module nltk.parse.malt:
class MaltParser(nltk.parse.api.ParserI)
| Method resolution order:
| MaltParser
| nltk.parse.api.ParserI
| __builtin__.object
|
| Methods defined here:
|
| __init__(self, tagger=None, mco=None, working_dir=None, additional_java_args=None)
| An interface for parsing with the Malt Parser.
|
| :param mco: The name of the pre-trained model. If provided, training
| will not be required, and MaltParser will use the model file in
| ${working_dir}/${mco}.mco.
| :type mco: str
...
So when you call maltParser = MaltParser(r"C:\maltparser-1.8.1\engmalt.poly-1.7.mco") then the keyword argument tagger is set to the path to the pretrained model.
Unfortunately this argument is not documented, but apparently it is a PoS tagger, as can be seen from inspecting the source.
(You don't have to specify a PoS tagger; there's a default RegEx-based tagger for English hard-coded in that class.)
So change your code to maltParser = MaltParser(mco=r"C:\maltparser-1.8.1\engmalt.poly-1.7.mco"), and you should be fine (at least until you find the next bug).
Your other questions: I think you're on the right track. If you're interested in dependencies, it's probably best to actually use dependency parsing, just as you are doing now. It is indeed possible to transform constituent parses into depencies (this has been proven), but it's probably more work.