NLTK getting dependencies from raw text - python-2.7

I need get dependencies in sentences from raw text using NLTK.
As far as I understood, stanford parser allows us just to create tree, but how to get dependencies in sentences from this tree I didn't find out (maybe it's possible, maybe not)
So I've started using MaltParser. Here is a peace code I'm using:
import os
from nltk.parse.stanford import StanfordParser
from nltk.tokenize import sent_tokenize
from nltk.parse.dependencygraph import DependencyGraph
from nltk.parse.malt import MaltParser
os.environ['JAVAHOME'] = r"C:\Program Files (x86)\Java\jre1.8.0_45\bin\java.exe"
os.environ['MALT_PARSER'] = r"C:\maltparser-1.8.1"
maltParser = MaltParser(r"C:\maltparser-1.8.1\engmalt.poly-1.7.mco")
class Parser(object):
#staticmethod
def Parse (text):
rawSentences = sent_tokenize(text)
treeSentencesStanford = stanfordParser.raw_parse_sents(rawSentences)
a=maltParser.raw_parse(rawSentences[0])
but last line throws exception "'str' object has no attribute 'tag'"
changing the code above like this:
rawSentences = sent_tokenize(text)
treeSentencesStanford = stanfordParser.raw_parse_sents(rawSentences)
splitedSentences = []
for sentence in rawSentences:
splitedSentence = word_tokenize(sentence)
splitedSentences.append(splitedSentence)
a=maltParser.parse_sents(splitedSentences)
throws the same exception.
So, what I'm I doing wrong.
And in general: I'm I going in right way to get dependencies like this: http://www.nltk.org/images/depgraph0.png (but I need access these dependencies from code)
Traceback (most recent call last):
File "E:\Google drive\Python multi tries\Python multi tries\Parser.py", line 51, in <module>
Parser.Parse("Some random sentence. Hopefully it will be parsed.")
File "E:\Google drive\Python multi tries\Python multi tries\Parser.py", line 32, in Parse
a=maltParser.parse_sents(splitedSentences)
File "C:\Python27\lib\site-packages\nltk-3.0.1-py2.7.egg\nltk\parse\malt.py", line 113, in parse_sents
tagged_sentences = [self.tagger.tag(sentence) for sentence in sentences]
AttributeError: 'str' object has no attribute 'tag'

You are instantiating MaltParser with an unsuitable argument.
Running help(MaltParser) gives the following information:
Help on class MaltParser in module nltk.parse.malt:
class MaltParser(nltk.parse.api.ParserI)
| Method resolution order:
| MaltParser
| nltk.parse.api.ParserI
| __builtin__.object
|
| Methods defined here:
|
| __init__(self, tagger=None, mco=None, working_dir=None, additional_java_args=None)
| An interface for parsing with the Malt Parser.
|
| :param mco: The name of the pre-trained model. If provided, training
| will not be required, and MaltParser will use the model file in
| ${working_dir}/${mco}.mco.
| :type mco: str
...
So when you call maltParser = MaltParser(r"C:\maltparser-1.8.1\engmalt.poly-1.7.mco") then the keyword argument tagger is set to the path to the pretrained model.
Unfortunately this argument is not documented, but apparently it is a PoS tagger, as can be seen from inspecting the source.
(You don't have to specify a PoS tagger; there's a default RegEx-based tagger for English hard-coded in that class.)
So change your code to maltParser = MaltParser(mco=r"C:\maltparser-1.8.1\engmalt.poly-1.7.mco"), and you should be fine (at least until you find the next bug).
Your other questions: I think you're on the right track. If you're interested in dependencies, it's probably best to actually use dependency parsing, just as you are doing now. It is indeed possible to transform constituent parses into depencies (this has been proven), but it's probably more work.

Related

PyYAML shows "ScannerError: mapping values are not allowed here" in my unittest

I am trying to test a number of Python 2.7 classes using unittest.
Here is the exception:
ScannerError: mapping values are not allowed here
in "<unicode string>", line 3, column 32:
... file1_with_path: '../../testdata/concat1.csv'
Here is the example the error message relates to:
class TestConcatTransform(unittest.TestCase):
def setUp(self):
filename1 = os.path.dirname(os.path.realpath(__file__)) + '/../../testdata/concat1.pkl'
self.df1 = pd.read_pickle(filename1)
filename2 = os.path.dirname(os.path.realpath(__file__)) + '/../../testdata/concat2.pkl'
self.df2 = pd.read_pickle(filename2)
self.yamlconfig = u'''
--- !ConcatTransform
file1_with_path: '../../testdata/concat1.csv'
file2_with_path: '../../testdata/concat2.csv'
skip_header_lines: [0]
duplicates: ['%allcolumns']
outtype: 'dataframe'
client: 'testdata'
addcolumn: []
'''
self.testconcat = yaml.load(self.yamlconfig)
What is the the problem?
Something not clear to me is that the directory structure I have is:
app
app/etl
app/tests
The ConcatTransform is in app/etl/concattransform.py and TestConcatTransform is in app/tests. I import ConcatTransform into the TestConcatTransform unittest with this import:
from app.etl import concattransform
How does PyYAML associate that class with the one defined in yamlconfig?
A YAML document can start with a document start marker ---, but that has to be at the beginning of a line, and yours is indented eight positions on the second line of the input. That causes the --- to be interpreted as the beginning of a multi-line plain (i.e. non-quoted) scalar, and within such a scalar you cannot have a : (colon + space). You can only have : in quoted scalars. And if your document does not have a mapping or sequence at the root level, as yours doesn't, the whole document can only consists of a single scalar.
If you want to keep your sources nicely indented like you have now, I recommend you use dedent from textwrap.
The following runs without error:
import ruamel.yaml
from textwrap import dedent
yaml_config = dedent(u'''\
--- !ConcatTransform
file1_with_path: '../../testdata/concat1.csv'
file2_with_path: '../../testdata/concat2.csv'
skip_header_lines: [0]
duplicates: ['%allcolumns']
outtype: 'dataframe'
client: 'testdata'
addcolumn: []
''')
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_config)
You should get into the habit to put the backslash (\) at the end of your first triple-quotes, so your YAML document. If you do that, your error would have actually indicated line 2 because the document doesn't start with an empty line anymore.
During loading the YAML parser encouncters the tag !ConcatTransform. A constructor for an object is probably registered with the PyYAML loader, associating that tag with the using PyYAML's add_constructor, during the import.
Unfortunately they registered their constructor with the default, non-safe, loader, which is not necessary, they could have registered with the SafeLoader, and thereby not force users to risk problems with non-controlled input.

How to solve AttributeError in python active_directory?

Running the below script works for 60% of the entries from the MasterGroupList however suddenly fails with the below error. although my questions seem to be poor ou guys have been able to help me before. Any idea how I can avoid getting this error? or what is trhoughing off the script? The masterGroupList looks like:
Groups Pulled from AD
SET00 POWERUSER
SET00 USERS
SEF00 CREATORS
SEF00 USERS
...another 300 entries...
Error:
Traceback (most recent call last):
File "C:\Users\ks185278\OneDrive - NCR Corporation\Active Directory Access Scr
ipt\test.py", line 44, in <module>
print group.member
File "C:\Python27\lib\site-packages\active_directory.py", line 805, in __getat
tr__
raise AttributeError
AttributeError
Code:
from active_directory import *
import os
file = open("C:\Users\NAME\Active Directory Access Script\MasterGroupList.txt", "r")
fileAsList = file.readlines()
indexOfTitle = fileAsList.index("Groups Pulled from AD\n")
i = indexOfTitle + 1
while i <= len(fileAsList):
fileLocation = 'C:\\AD Access\\%s\\%s.txt' % (fileAsList[i][:5], fileAsList[i][:fileAsList[i].find("\n")])
#Creates the dir if it does not exist already
if not os.path.isdir(os.path.dirname(fileLocation)):
os.makedirs(os.path.dirname(fileLocation))
fileGroup = open(fileLocation, "w+")
#writes group members to the open file
group = find_group(fileAsList[i][:fileAsList[i].find("\n")])
print group.member
for group_member in group.member: #this is line 44
fileGroup.write(group_member.cn + "\n")
fileGroup.close()
i+=1
Disclaimer: I don't know python, but I know Active Directory fairly well.
If it's failing on this:
for group_member in group.member:
It could possibly mean that the group has no members.
Depending on how phython handles this, it could also mean that the group has only one member and group.member is a plain string rather than an array.
What does print group.member show?
The source code of active_directory.py is here: https://github.com/tjguk/active_directory/blob/master/active_directory.py
These are the relevant lines:
if name not in self._delegate_map:
try:
attr = getattr(self.com_object, name)
except AttributeError:
try:
attr = self.com_object.Get(name)
except:
raise AttributeError
So it looks like it just can't find the attribute you're looking up, which in this case looks like the 'member' attribute.

Mailgun Talon: Signature extraction example throwing error

I installed mailgun/talon on GCE and was trying out the example in the README section, but it threw the following error at me:
>>> from talon import signature
>>> message = """Thanks Sasha, I can't go any higher and is why I limited it to the
... homepage.
...
... John Doe
... via mobile"""
>>> message
"Thanks Sasha, I can't go any higher and is why I limited it to the\nhomepage.\n\nJohn Doe\nvia mobile"
>>> text,signtr = signature.extract(message, sender='john.doe#example.com')
ERROR:talon.signature.extraction:ERROR when extracting signature with classifiers
Traceback (most recent call last):
File "talon/signature/extraction.py", line 57, in extract
markers = _mark_lines(lines, sender)
File "talon/signature/extraction.py", line 99, in _mark_lines
elif is_signature_line(line, sender, EXTRACTOR):
File "talon/signature/extraction.py", line 40, in is_signature_line
return classifier.decisionFunc(data, 0) > 0
AttributeError: 'NoneType' object has no attribute 'decisionFunc'
Do I need to train the model somehow (this signature seems to be the ML example)? I installed it using pip.
If you want to use signature parsing with classifiers you just need to call talon.init() before using the lib - it loads trained classifiers. Other methods like talon.signature.bruteforce.extract_signature() or talon.quotations.extract_from() don't require classifiers. Here's a full code sample:
import talon
# don't forget to init the library first
# it loads machine learning classifiers
talon.init()
from talon import signature
message = """Thanks Sasha, I can't go any higher and is why I limited it to the
homepage.
John Doe
via mobile"""
text, signature = signature.extract(message, sender='john.doe#example.com')
# text == "Thanks Sasha, I can't go any higher and is why I limited it to the\nhomepage."
# signature == "John Doe\nvia mobile"

During migrating tool from windows to linux lxml error

I have developed a tool in python 2.7 that take xsd file as input ,
and give the process data into a test file
During processing the xsd file I used lxml, I am unable to resolve this sort of error.
AttributeError: 'Element' object has no attribute 'iterdescendants'
I don`t know what wrong with the lxml lib.
I want to know is there any lxml Linux compatible version for python 2.7
I have imported in the file like below:
try:
from lxml import etree
except ImportError:
import xml.etree.ElementTree as etree
I have imported only in file , and sending the element tree pointer to process the the element into another file ,
it is OK in the declared file , giving error in another file only.
the code throw the error is :
for tdocNode in lincFileRootNode:
rootNode = tdocNode.getroot()
lchildren = rootNode.getchildren()
for elt in lchildren:
if 'complex' == elt.tag:
if 'name' in elt.attrib:
if 'element' == item.tag:
if 'type' in item.attrib:
if elt.attrib['name'] == item.attrib['type']:
for key in elt.iterdescendants(tag='element'):
bIsElemTypeSimple = false
bIsElemTypeSimple = process_elementtype(key, lincFileRootNode)
where :
lincFileRootNode --> is list that containe the xsd file pointer to be processed
the error thrown is :
Traceback (most recent call last):
File "run.py", line 1210, in <module>
iret = xsd2dic_main()
File "run.py", line 71, in xsd2dic_main
iRet = yxsdtodic()
File "run.py", line 352, in yxsdtodic
iret = process_xsdfile(sXsdPath)
File "run.py", line 485, in xsdfile
sRet = process_dic_elementtype(item,lincFileRootNode)
File "run.py", line 817, in process_dic_elementtype
for key in elt.iterdescendants(tag='element'):
AttributeError: 'Element' object has no attribute 'iterdescendants'
i tired in the both the cases :
1:writing all code in a same file
2:writing different files
still i am getting the same error
This is mostly a guess, but look into it.
You appear to be calling iterdescendants from lxml's implementation of the Element type. However, if lxml fails to import, you fall back on Python's built in xml library instead. But it's implementation of Element doesn't have an iterdescendants methods of any kind. In other words, the two implementations have different public APIs. Add some print statements to see which library you're importing and do some additionally checking to see exactly what type elt is. If you want to be able to fall back on Python's built in xml, you'll need to structure your code to accommodate the different APIs.

Unit testing in Web2py

I'm following the instructions from this post but cannot get my methods recognized globally.
The error message:
ERROR: test_suggest_performer (__builtin__.TestSearch)
----------------------------------------------------------------------
Traceback (most recent call last):
File "applications/myapp/tests/test_search.py", line 24, in test_suggest_performer
suggs = suggest_flavors("straw")
NameError: global name 'suggest_flavors' is not defined
My test file:
import unittest
from gluon.globals import Request
db = test_db
execfile("applications/myapp/controllers/search.py", globals())
class TestSearch(unittest.TestCase):
def setUp(self):
request = Request()
def test_suggest_flavors(self):
suggs = suggest_flavors("straw")
self.assertEqual(len(suggs), 1)
self.assertEqual(suggs[0][1], 'Strawberry')
My controller:
def suggest_flavors(term):
return []
Has anyone successfully completed unit testing like this in web2py?
Please see: http://web2py.com/AlterEgo/default/show/260
Note that in your example the function 'suggest_flavors' should be defined at 'applications/myapp/controllers/search.py'.
I don't have any experience with web2py, but used other frameworks a lot. And looking at your code I'm confused a bit. Is there an objective reason why execfile should be used? Isn't it better to use regular import statement. So instead of execfile you may write:
from applications.myapp.controllers.search import suggest_flavors
It's more clear code for pythoners.
Note, that you should place __init__.py in each directory along the path in this case, so that dirs will form package/module hierarchy.