I have this inherited code which in Python 2.7 successfully returns results in xml that are then parsed by ElementTree.
result = alchemyObj.TextGetRankedNamedEntities(text)
root = ET.fromstring(result)
I am updating program to Python 3.5 and am attempting to do this so that I don't need to modify xml parsing of results:
result = alchemy_language.entities(outputMode='xml', text='text', max_
items='10'),
root = ET.fromstring(result)
Per http://www.ibm.com/watson/developercloud/alchemy-language/api/v1/#entities outputMode allows the choice between json default and xml. However, I get this error:
Traceback (most recent call last):
File "bin/nerv35.py", line 93, in <module>
main()
File "bin/nerv35.py", line 55, in main
result = alchemy_language.entities(outputMode='xml', text='text', max_items='10'),
TypeError: entities() got an unexpected keyword argument 'outputMode'
Does outputMode actually still exist? If so, what is wrong with the entities parameters?
The watson-developer-cloud does not appear to have this option for Entities. The settings allowed are:
html
text
url
disambiguate
linked_data
coreference
quotations
sentiment
show_source_text
max_items
language
model
You can try accessing the API directly by using requests. For example:
import requests
alchemyApiKey = 'YOUR_API_KEY'
url = 'https://gateway-a.watsonplatform.net/calls/text/TextGetRankedNamedEntities'
payload = { 'apikey': alchemyApiKey,
'outputMode': 'xml',
'text': 'This is an example text. IBM Corp'
}
r = requests.post(url,payload)
print r.text
Should return this:
<?xml version="1.0" encoding="UTF-8"?>
<results>
<status>OK</status>
<usage>By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html</usage>
<url></url>
<language>english</language>
<entities>
<entity>
<type>Company</type>
<relevance>0.961433</relevance>
<count>1</count>
<text>IBM Corp</text>
</entity>
</entities>
</results>
Related
I am interating over an xml tree using the lxml.tree function iterparse().
This works ok with an input file
xml_source = "formatted_html_diff.xml"
context = ET.iterparse(xml_source, events=("start",))
event, root = context.next()
However, I would like to use a string containing the same information in the file.
I tried using
context = ET.iterparse(StringIO(result), events=("start",))
But this causes the following error:
Traceback (most recent call last):
File "c:/Users/pag/Documents/12_raw_handle/remove_from_xhtmlv02.py", line 96, in <module>
event, root = context.next()
File "src\lxml\iterparse.pxi", line 209, in lxml.etree.iterparse.__next__
TypeError: reading file objects must return bytes objects
Does anyone know how could I solve this error?
Thanks in advance.
Use BytesIO instead of StringIO. The following code works with both Python 2.7 and Python 3:
from lxml import etree
from io import BytesIO
xml = """
<root>
<a/>
<b/>
</root>"""
context = etree.iterparse(BytesIO(xml.encode("UTF-8")), events=("start",))
print(next(context))
print(next(context))
print(next(context))
Output:
('start', <Element root at 0x315dc10>)
('start', <Element a at 0x315dbc0>)
('start', <Element b at 0x315db98>)
I am trying to extract the <comment> tag (using xml.etree.ElementTree) from the XML and find the comment count number and add all of the numbers. I am reading the file via a URL using urllib package.
sample data: http://python-data.dr-chuck.net/comments_42.xml
But currently i am trying to trying to print the name, and count.
import urllib
import xml.etree.ElementTree as ET
serviceurl = 'http://python-data.dr-chuck.net/comments_42.xml'
address = raw_input("Enter location: ")
url = serviceurl + urllib.urlencode({'sensor': 'false', 'address': address})
print ("Retrieving: ", url)
link = urllib.urlopen(url)
data = link.read()
print("Retrieved ", len(data), "characters")
tree = ET.fromstring(data)
tags = tree.findall('.//comment')
for tag in tags:
Name = ''
count = ''
Name = tree.find('commentinfo').find('comments').find('comment').find('name').text
count = tree.find('comments').find('comments').find('comment').find('count').number
print Name, count
Unfortunately, I am not able to even parse the XML file into Python, because i am getting this error as follows:
Traceback (most recent call last):
File "ch13_parseXML_assignment.py", line 14, in <module>
tree = ET.fromstring(data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1300, in XML
parser.feed(text)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
self._raiseerror(v)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: syntax error: line 1, column 49
I have read previously in a similar situation that maybe the parser isn't accepting the XML file. Anticipating this, i did a Try and Except around tree = ET.fromstring(data) and I was able to get past this line, but later it is throwing an erro saying tree variable is not defined. This defeats the purpose of the output I am expecting.
Can somebody please point me in a direction that helps me?
I am new to python as well as XMl. I am trying to parse an XML file, find the values and the sum of those values. I have included the code as well as the data below.
import xml.etree.ElementTree as ET
data='''
<place>
<note>Test data</note>
<hospitals>
<doctor>
<name>John</name>
<count>97</count>
</doctor>
<doctor>
<name>Sam</name>
<count>97</count>
</doctor>
<doctor>
<name>Luke</name>
<count>90</count>
</doctor>
<doctor>
<name>Mark</name>
<count>90</count>
</doctor>
</hospitals>
</place> '''
tree=ET.fromstring (data)
for lines in tree.findall('place/hospitals/doctor'):
print lines.get('count'), lines.text
When I execute the above code, I am not getting any output.
Then I changed the code to :
tree=ET.fromstring (data)
print 'count:',tree.find('count').text
and the output is:
Traceback (most recent call last):
File "test2.py", line 26, in <module>
print 'count:',tree.find('count').text
AttributeError: 'NoneType' object has no attribute 'text'
Any help is appreciated guys.
Thank you
Element.findall() finds only elements with a tag which are direct children of the current element. The documentation for ElementTree is here.
So are the code examples.
For now, try this:
for line in tree.findall('./hospitals/doctor/count'):
print line.text
The above code just prints the counts. You will have to write the code to sum them up.
Im trying to use ElementTree to get data from a .config file. The structure of this file is like this for example:
<userSettings>
<AutotaskUpdateTicketEstimatedHours.My.MySettings>
<setting name="Username" serializeAs="String">
<value>AAA</value>
</setting>
My code is this:
import os, sys
import xml.etree.ElementTree as ET
class Init():
script_dir = os.path.dirname(__file__)
rel_path = "app.config"
abs_file_path = os.path.join(script_dir, rel_path)
tree = ET.parse(abs_file_path)
root = tree.getroot()
sites = root.iter('userSettings')
for site in sites:
apps = site.findall('AutotaskUpdateTicketEstimatedHours.My.MySettings')
for app in apps:
print(''.join([site.get('Username'), app.get('value')]))
if __name__ == '__main__':
handler = Init()
However, when I run this code I get:
Traceback (most recent call last):
File "/Users/AAAA/Documents/Aptana/AutotaskUpdateTicketEstimatedHours/Main.py", line 5, in <module>
class Init():
File "/Users/AAA/Documents/Aptana/AutotaskUpdateTicketEstimatedHours/Main.py", line 16, in Init
print(''.join([site.get('Username'), app.get('value')]))
TypeError: sequence item 0: expected string, NoneType found
What I'm I doing wrong the causes this error?
(My problem seems to be accessing the tree structure of my config.file correctly)
You may change your code to:
print(''.join([app.get('name'), app.find('value').text]))
app is an Element Object in this case <setting>. Using the get function you will get an attribute value by name (e.g. name, serializeAs), using the find
function you will get a subelement (e.g <value>).
Once you have <value> you can get the data inside with text
Note that site (<AutotaskUpdateTicketEstimatedHours.My.MySettings>) doesn't have any attributes, therefore you get None.
I have a file content as below
<?xml version='1.0' encoding='UTF-8'?><cont:ContactId><all:Individual.partyId>10028305</all:Individual.partyId></cont:ContactId><all:applicationId>C18400</all:applicationId>
I need to copy "C18400" from this file and print in output in SIKULI. Please let me know whether we can use python script to fetch the output or is there any other way to do it. Also let me know whether we can use sikuli image capture for this, but this application id that I need in output is dynamic which keeps changing.
I used the below python script, but it didn't work
CODE:
with open("D:\\SODS.txt","r") as fp:
for line in fp:
if "applicationId" in line:
print "true"
print re.search(r"applicationId\>(.+?)\<",fp)
ERROR:
TypeError: expected str or unicode but got
Actually my file looks as below
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<soapenv:Body>
<cont:CreateContactResponse xmlns:cont="http://DataModelContactManagement" xmlns:all="http://MID/Business/0.1/All" xmlns:com="http://SOA/1.0/Common">
<com:status>0</com:status>
<com:description/>
<cont:ContactId>
<all:MIDBusCm.Individual.partyId>10028310</all:MIDBusCm.Individual.partyId>
</cont:ContactId>
<all:MIDBusCu.CustomerOrder.applicationId>C18403</all:MIDBusCu.CustomerOrder.applicationId>
</cont:CreateContactResponse>
I tried the below PYTHON script
import xml.etree.ElementTree as ET
data = open("D:\\Pravina\\Projects\\Meteor\\Automation\\API\\SODS.txt", 'r')
tree = ET.parse(data)
doc = tree.getroot()
print doc
rootText='.//{http://'SOA'/1.0/Common}'
errorCode=tree.find(rootText + 'status').text
print errorCode
With the above code the output i got was
"/schemas.xmlsoap.org/soap/envelope/}Envelope at a>
0 "
In my case the output should be "C18403" this is in between the tag 'all:MIDBusCu.CustomerOrder.applicationId'.
Please let me know if I am missing something here?