Use text file to provide input for etree xml - python-2.7

I have a xml file and I need to change 2 parameters in this xml file from the etee.
XML file:
<?xml version="1.0"?>
<ABC_Input xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >
<REQ>
<!-- Optional in XSD -->
<INFO>ALL</INFO>
</REQ>
<PO>
<PO_ID>3557698</PO_ID>
<!-- Req in XSD -->
<RAN>HQF011512C</RAN>
<!-- Req in XSD -->
</PO>
</ABC_Input>
I have written below code to achieve this
import xml.etree.ElementTree as ET
tree = ET.parse('alpha.xml')
root = tree.getroot()
for val in root.findall('PO')
val.find('RAN').text="HQ123"
tree.write('output1.xml')
Now I need to pass value of RAN & PO_ID from text file as input then how is that possible?

Try this:
import xml.etree.ElementTree as ET
# Read the file (called alpha.txt) and extract the lines
lines = []
with open('alpha.txt', 'r') as txtF:
lines = txtF.readlines()
# A dictionary to hold PO_ID and RAN values
values = {}
# If there are multiple values of PO_ID or RAN then it will take the last entry in the text file
for line in lines:
line = line.strip()
if 'RAN' in line:
values['RAN'] = line[line.find('RAN')+len('RAN')+1:]
elif 'PO_ID' in line:
values['PO_ID'] = line[line.find('PO_ID')+len('PO_ID')+1:]
else:
continue
tree = ET.parse('alpha.xml')
root = tree.getroot()
for val in root.findall('PO'):
val.find('PO_ID').text = values['RAN']
val.find('RAN').text = values['PO_ID']
tree.write('output1.xml')

Related

Is outputMode Still Supported In alchemy_language.entities

I have this inherited code which in Python 2.7 successfully returns results in xml that are then parsed by ElementTree.
result = alchemyObj.TextGetRankedNamedEntities(text)
root = ET.fromstring(result)
I am updating program to Python 3.5 and am attempting to do this so that I don't need to modify xml parsing of results:
result = alchemy_language.entities(outputMode='xml', text='text', max_
items='10'),
root = ET.fromstring(result)
Per http://www.ibm.com/watson/developercloud/alchemy-language/api/v1/#entities outputMode allows the choice between json default and xml. However, I get this error:
Traceback (most recent call last):
File "bin/nerv35.py", line 93, in <module>
main()
File "bin/nerv35.py", line 55, in main
result = alchemy_language.entities(outputMode='xml', text='text', max_items='10'),
TypeError: entities() got an unexpected keyword argument 'outputMode'
Does outputMode actually still exist? If so, what is wrong with the entities parameters?
The watson-developer-cloud does not appear to have this option for Entities. The settings allowed are:
html
text
url
disambiguate
linked_data
coreference
quotations
sentiment
show_source_text
max_items
language
model
You can try accessing the API directly by using requests. For example:
import requests
alchemyApiKey = 'YOUR_API_KEY'
url = 'https://gateway-a.watsonplatform.net/calls/text/TextGetRankedNamedEntities'
payload = { 'apikey': alchemyApiKey,
'outputMode': 'xml',
'text': 'This is an example text. IBM Corp'
}
r = requests.post(url,payload)
print r.text
Should return this:
<?xml version="1.0" encoding="UTF-8"?>
<results>
<status>OK</status>
<usage>By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html</usage>
<url></url>
<language>english</language>
<entities>
<entity>
<type>Company</type>
<relevance>0.961433</relevance>
<count>1</count>
<text>IBM Corp</text>
</entity>
</entities>
</results>

Using diff with beautiful soup objects

I am trying to compare the text all instances of a particular tag in two XML files. The OCR engine I am using outputs an xml files with all the ocr chraracters in a tag <OCRCharacters>...</OCRCharacters>.
I am using python 2.7.11 and beautiful soup 4 (bs4). From the terminal, I am calling my python program with two xml file names as arguments.
I want to extract all the strings in the <OCRCharacters> tag for each file, compare them line by line with difflib, and write a new file with the differences.
I use $ python parse_xml_file.py file1.xml file2.xml to call the program from the terminal.
The code below opens each file and prints each string in the tag <OCRCharacters>. How should I convert the objects made with bs4 to strings that I can use with difflib. I am open to better ways (using python) to do this.
import sys
with open(sys.argv[1], "r") as f1:
xml_doc_1 = f1.read()
with open(sys.argv[2], "r") as f2:
xml_doc_2 = f2.read()
from bs4 import BeautifulSoup
soup1 = BeautifulSoup(xml_doc_1, 'xml')
soup2 = BeautifulSoup(xml_doc_2, 'xml')
print("#####################",sys.argv[1],"#####################")
for tag in soup1.find_all('OCRCharacters'):
print(repr(tag.string))
temp1 = repr(tag.string)
print(temp1)
print("#####################",sys.argv[2],"#####################")
for tag in soup2.find_all('OCRCharacters'):
print(repr(tag.string))
temp2 = repr(tag.string)
You can try this :
import sys
import difflib
from bs4 import BeautifulSoup
text = [[],[]]
files = []
soups = []
for i, arg in enumerate(sys.argv[1:]):
files.append(open(arg, "r").read())
soups.append(BeautifulSoup(files[i], 'xml'))
for tag_text in soups[i].find_all('OCRCharacters'):
text[i].append(''.join(tag_text))
for first_string, second_string in zip(text[0], text[1]):
d = difflib.Differ()
diff = d.compare(first_string.splitlines(), second_string.splitlines())
print '\n'.join(diff)
With xml1.xml :
<node>
<OCRCharacters>text1_1</OCRCharacters>
<OCRCharacters>text1_2</OCRCharacters>
<OCRCharacters>Same Value</OCRCharacters>
</node>
and xml2.xml :
<node>
<OCRCharacters>text2_1</OCRCharacters>
<OCRCharacters>text2_2</OCRCharacters>
<OCRCharacters>Same Value</OCRCharacters>
</node>
The output will be :
- text1_1
? ^
+ text2_1
? ^
- text1_2
? ^
+ text2_2
? ^
Same Value

Add duplicate child tag in existing xml

sample.py
import xml.etree.cElementTree as ET
log_file=open("filename.xml","a")
root = ET.Element("VOD")
doc = ET.Element("SessionDetails")
root.append(doc)
tree = ET.ElementTree(root)
tree.write("filename.xml")
o/p on running sample.py 3 time
<?xml version="1.0"?>
-<VOD>
<SessionDetails/>
</VOD>
[Note :I am not getting below output]Desired o/p is if I run sample.py 3 times the o/p should be as below
-<VOD>
<SessionDetails/>
<SessionDetails/>
<SessionDetails/>
</VOD>
I got result using below method
First time XML creation
from xml.dom.minidom import getDOMImplementation
impl = getDOMImplementation()
newdoc = impl.createDocument(None, "VOD", None)
top_element = newdoc.documentElement
text = newdoc.createElement('SessionDetaild')
top_element.appendChild(text)
newdoc.writexml(open("filename.xml","w"))
For appending data in xml
import xml.dom.minidom as m
doc = m.parse("filename.xml")
valeurs = doc.getElementsByTagName("VOD").item(0)
element = doc.createElement("SessionDetaild")
valeurs.appendChild(element)
doc.writexml(open("filename.xml","w"))
Reference:
http://stackoverflow.com/questions/11074021/inserting-xml-nodes-in-an-existing-xml-document-with-python

how to lookup the numbers next to character using python

this is just part of the long python script. there is a file called aqfile and it has many parameters. I would like to extract what is next to "OWNER" and "NS".
Note:
OWNER = text
NS = numbers
i could extract what is next to OWNER, because they were just text and i could extract.
for line in aqfile.readlines():
if string.find(line,"OWNER")>0:
print line
m=re.search('<(.*)>',line)
owner=incorp(m.group(1))
break
but when i try to modify the script to extract the numbers
for line in aqfile.readlines():
if string.find(line,"NS")>0:
print line
m=re.search('<(.*)>',line)
ns=incorp(m.group(1))
break
it doesnt work any more.
Can anyone help me?
this is the whole script
#Make a CSV file of datasetnames. pulseprog and, if avaible, (part of) the title
#Note: the whole file tree is read into memory!!! Do not start too high in the tree!!!
import os
import os.path
import fnmatch
import re
import string
max=20000
outfiledesc=0
def incorp(c):
#Vervang " door """ ,CRLF door blankos
c=c.replace('"','"""')
c=c.replace("\r"," ")
c=c.replace("\n"," ")
return "\"%s\"" % (c)
def process(arg,root,files):
global max
global outfiledesc
#Get name,expno,procno from the root
if "proc" in files:
procno = incorp(os.path.basename(root))
oneup = os.path.dirname(root)
oneup = os.path.dirname(oneup)
aqdir=oneup
expno = incorp(os.path.basename(oneup))
oneup = os.path.dirname(oneup)
dsname = incorp(os.path.basename(oneup))
#Read the titlefile, if any
if (os.path.isfile(root + "/title")):
f=open(root+"/title","r")
title=incorp(f.read(max))
f.close()
else:
title=""
#Grab the pulse program name from the acqus parameter
aqfile=open(aqdir+"/acqus")
for line in aqfile.readlines():
if string.find(line,"PULPROG")>0:
print line
m=re.search('<(.*)>',line)
pulprog=incorp(m.group(1))
break
towrite= "%s;%s;%s;%s;%s\n" % (dsname,expno,procno,pulprog,title)
outfiledesc.write(towrite)
#Main program
dialogline1="Starting point of the search"
dialogline2="Maximum length of the title"
dialogline3="output CSV file"
def1="/opt/topspin3.2/data/nmrafd/nmr"
def2="20000"
def3="/home/nmrafd/filelist.csv"
result = INPUT_DIALOG("CSV file creator","Create a CSV list",[dialogline1,dialogline2,dialogline3],[def1,def2,def3])
start=result[0]
tlength=int(result[1])
outfile=result[2]
#Search for procs files. They should be in any dataset.
outfiledesc = open(outfile,"w")
print start
os.path.walk(start,process,"")
outfiledesc.close()

How do I link a local disk location URL to a tag in XML?

I am pretty new to XML and XML with Python. I am using LXML module for this. My objective is to do something like:
<include>
<!--This is the result--> #This is for naming the result of the file .
<check run = "1000">
<params>
<param name="Name" path="$${path_to_the_file_in_local_disk}"/>
</params>
<True>
<variable name="File1" path=""/>
<variable name="File2" path="c:\xyz"/>
<variable name="File3" path="c:\xyz"/>
<variable name="File4" path="c:\xyz"/>
<variable name="File5" path="c:\xyz"/>
<variable name="File6" path="c:\xyz"/>
<variable name="File7" path="c:\xyz"/>
<variable name="File8" path="c:\xyz"/>
</variables>
</user>
</include>
And this i want to generate dynamically. Say, i have some 10 files and based on certain search criteria, i need to Classify the files. Lets say, classification is True and False.
So, under True section, i have some 4 files. I want to make an entry in the XML with their respective file location on the local disk. When i open the XML file in browser, the link in the XML file can open up the directory for me.
So my Questions are:
1. How do i create a XML tag each time a condition is met?
2. How do i link it to the local disk location?
Till far, i have done the Console printing of the result.
f = open('./script.log', 'r')
for lines in f.readlines():
passed = lines.find("=== Result: PASS ===")
failed = lines.find("=== Result: FAIL ===")
if passed != -1:
print "True File"
passed_cnt = passed_cnt + 1
passed_list.append(os.getcwd())
lookup = '* COMMAND:'
with open('./script.log') as myFile:
for num, line in enumerate(myFile, 1):
if lookup in line:
#print 'found at line:', num
tc_id = (line.split('\\')[-1]).split(' ')[-3]
print "TRUE FILE Name : ", tc_id
variable = etree.SubElement(variables, "variable")
variable.set('name', 'path')
variable.set('value', '1000')
To answer the question in the title:
with open("outfile.xml", "wb") as outfile:
outfile.write(etree.tostring(xmlroot, xml_declaration=True))
To answer the question in the post:
You link to a local file with a file: url. I'm unsure how they should look exactly on Windows, but I think it's like this:
file://c\:\\<path to the file>
Look for examples and experiment.
I found a way to deal with the problem here. My issues were:
1. Generating a XML file.
2. This file was to be be compiled dynamically for each and every run.
I did something like:
from __future__ import division
import os
import fnmatch
import xml.etree.cElementTree as ET
import time
import csv
from xml.etree.ElementTree import Element, SubElement, Comment, tostring
import datetime
from lxml import etree
import smtplib
root = etree.Element("include")
comment1 = etree.Comment("================<Your Text>================")
root.append(comment1)
user1 = etree.SubElement(root, "Complete_Results")
param = etree.SubElement(user1, "Total")
param.set('Success_Percentage', str('%.2f'%((passed_cnt/total_Count)*100)))
param = etree.SubElement(user1, "Total")
param.set('Failure_Percentage', str('%.2f'%((failed_cnt/total_Count)*100)))
param = etree.SubElement(user1, "Aggregate_Result")
if pass_percentage == 100:
res = "_________________Successfully_Passed________________"
else:
res = "________________Iteration_Failed________________"
param.set('Finally', res)
user1 = etree.SubElement(root, "Success_Results")
comment2 = etree.Comment("======================= Passed test cases section ==========================")
user1.append(comment2)
user1.set('Number_of_Test_cases_passed', str(passed_cnt))
params = etree.SubElement(user1, "Results")
param = etree.SubElement(params, "Success_Results")
for i in passed_TC_list:
for location in passed_list:
param = etree.SubElement(params, 'TC_Details')
param.set('File_name', str(i))
param = etree.SubElement(params, 'ID' )
param.set('Path_in_Local_Directory',str(location))
path = str(str(location) + str("\\") + str(i))
param.set('Link_to_file', str(path))
passed_list.remove(location)