I read multiple threads before posting about xml namespace, but still having issue with writing child xml element without namespace in a file.
Even though I mentioned registered namespace as empty before parsing/reading the file, "findall" not returning any elements. I verified the namespace present in code and xml file, also printed on root.tag.
If I completely remove the xmlns from tag, code is working, but I wanted to read the xml file without namespace and write into file without namespace. Could you please let me know the mistake I m doing here ?
This is the code I tried.
import xml.etree.ElementTree as ET
ET.register_namespace("","urn:iso:2012.tech.xsd.001.04") ##Making sure parse a xml file without namespace
tree = ET.parse("sample.xml")
root = tree.getroot()
print("%s : %s"%(root.tag, root.attrib))
out_handle = open("customer_header.xml","ab")
for elt in root.iter():
all_ntry = elt.findall('Customer') ## Not returning all Customer elements, even though ET.register_namespace('',uri) mentioned before parsing
for ele in all_ntry:
print("Customer Block Found:%s"%ele)
ele_tree = ET.ElementTree(ele)
ele_tree.write(out_handle)
XML File(sample.xml):
<?xml version="1.0" encoding="UTF-8"?>
<Document xmlns:xsi="http://www.company.org/2000/instance"
xmlns="urn:iso:2012.tech.xsd.001.04">
<BackToCustomer>
<CustGrup>
<Mid>000002</Mid>
<Date>2017-09-24T00:54:26</Date>
<Info>TEST</Info>
</CustGrup>
<Batch>
<Id>12345678</Id>
<Date>2017-09-22T13:54:26</Date>
<ListInfo>
<Id>
<Othr>
<Id>TEST_ListInfo</Id>
</Othr>
</Id>
</ListInfo>
<Details>
<Total>
<Count>5</Count>
<Amt>25.80</Amt>
</Total>
</Details>
<Customer>
<CustomerRef>ABC123</CustomerRef>
</Customer>
<Customer>
<CustomerRef>XYZ123</CustomerRef>
</Customer>
</Batch>
</BackToCustomer>
</Document>
I need to write a file only Customer element without namespace.
.
Related
I have an XML file below where I need to get the text inside of the <Description> tag under the <Checklist> tag where the <Sequence> tag has the text 40. How to achieve it?
<?xml version="1.0" encoding="UTF-8"?>
<SyncMaintenanceOrder>
<DataArea>
<MaintenanceOrder>
<MaintenanceOrderHeader>
<DocumentID>
<ID accountingEntity="AT">1105442</ID>
</DocumentID>
<Description>Routine Bridge Inspection - S6</Description>
<PriorityCode>2</PriorityCode>
<ReportedDateTime>2020-04-29T20:21:27Z</ReportedDateTime>
</MaintenanceOrderHeader>
<MaintenanceOrderLine>
<LineNumber>10</LineNumber>
<RemainingDuration>PT8H0M0S</RemainingDuration>
<ActivityDeferredIndicator>false</ActivityDeferredIndicator>
<UserArea>
<EamCheckListInfo>
<CheckList>
<CheckListItem>
<Sequence>40</Sequence>
<Description>Half joints (Superstructure elements)</Description>
</CheckListItem>
<CheckListItem>
<Sequence>160</Sequence>
<Description>Substructure drainage (Durability elements)</Description>
</CheckListItem>
<CheckListItem>
<Sequence>60</Sequence>
<Description>Parapet beam or cantilever (Superstructure elements)</Description>
</CheckListItem>
</CheckList>
</EamCheckListInfo>
</UserArea>
</MaintenanceOrderLine>
</MaintenanceOrder>
</DataArea>
</SyncMaintenanceOrder>
I need sample of an XSLT code for selecting only the text node described above.
I'm not sure I really get your question.
This will print the Description of the CheckListItem which has a Sequence of 40:
<xsl:value-of select="//CheckListItem/Sequence[text()='40']/../Description"/>
Try it here: https://xsltfiddle.liberty-development.net/ehVZvvZ
Try the below code
<xsl:value-of select="*[local-name(.)='SyncMaintenanceOrder']/*[local-name(.)='DataArea']/*[local-name(.)='MaintenanceOrder']/*[local-name(.)='MaintenanceOrderLine']/*[local-name(.)='UserArea']/*[local-name(.)='EamCheckListInfo']/*[local-name(.)='CheckList']/*[local-name(.)='CheckListItem']/*[local-name(.)='Sequence'][text() = '40']/../*[local-name(.)='Description']" />
so I have this hl7 type message that I have to transform using either regex or xslt or combination of two.
Format of this message is DateTime(as in YYYYMMDDHHMMSS)^UnitName^room^bed|). Each location is separated with a pipe, so each person can have one or multiple locations.
And the messages looks like this( when a patient has only one location):
20130602201605^Some Hospital^ABFG^411|
End xml result should look like this:
<Location>
<item>
<when>20130602201605</when>
<UnitName>Some Hospital</UnitName>
<room>ABFG</room>
<bed>411</bed>
</item>
</Location>
I would probably use substring type of function if it was only one location.
The problem I am running into is when there is more than one. I am relatively new to xslt and regex in general so I don't know how to use recursion in these instances.
So if I have a message like this with multiple locations:
20130601003203^GBMC^XXYZ^110|20130602130600^Sanai^ABC^|20130602150003^John Hopkins^J615^A|
The end result should be:
<Location>
<item>
<when>0130601003203</when>
<UnitName>GBMC</UnitName>
<room>XXYZ</room>
<bed>110</bed>
</item>
<item>
<when>20130602130600</when>
<UnitName>Sanai</UnitName>
<room>ABC</room>
<bed></bed>
</item>
<item>
<when>20130602150003</when>
<UnitName>John Hopkins</UnitName>
<room>J615</room>
<bed>A</bed>
</item>
</Location>
So how would I solve this? Thanks in advance.
Given that your Hl7 message is "|^~\&" encoded and not in an XML format, it is not clear how you will be using an XSLT 1.0 processor for your task. Can you describe your processing pipeline in greater detail? Your snippets are not complete messages, and it is not clear whether you will be starting with complete messages or attempting to parse isolated fields handed to a larger processing task through parameters or something.
If your processing starts with a complete HL7 message, I would suggest looking into the HAPI project, or a similar set of libraries, to have the messages converted from |^~\& to </> format, then invoking your XSLT on that version of the data. (You could also use the HAPI libraries in a full-Java solution. In either case, there are code examples at the HAPI site and at an Apache site on HL7.) If you are not interested in using Java at all, but are open to partial non-XSLT solutions, there are other projects that provide similar serialization options (e.g., Net::HL7 for Perl, nHAPI for VB/C#, etc.).
If you have isolated "|^~\&" encoded data in an otherwise XML formatted file, then I would suggest looking into the str:tokenize function in the XSLT 1.0 exslt functions. (XSLT 2.0 has a built-in tokenize function.) You can have str:tokenize split your data on the field or component separators, then create elements using the tokenized substrings.
Here is a stylesheet
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:str="http://exslt.org/strings"
extension-element-prefixes="str"
version="1.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="data">
<Location>
<xsl:for-each select="str:tokenize(.,'|')">
<xsl:call-template name="handle-field">
<xsl:with-param name="field" select="."/>
</xsl:call-template>
</xsl:for-each>
</Location>
</xsl:template>
<xsl:template name="handle-field">
<xsl:param name="field"/>
<xsl:variable name="components" select="str:tokenize($field,'^')"/>
<item>
<when><xsl:value-of select="$components[1]"/></when>
<UnitName><xsl:value-of select="$components[2]"/></UnitName>
<room><xsl:value-of select="$components[3]"/></room>
<bed><xsl:value-of select="$components[4]"/></bed>
</item>
</xsl:template>
</xsl:stylesheet>
that runs over this input
<?xml version="1.0" encoding="UTF-8"?>
<data>20130601003203^GBMC^XXYZ^110|20130602130600^Sanai^ABC^|20130602150003^John Hopkins^J615^A|</data>
to produce this output with xsltproc:
<?xml version="1.0"?>
<Location>
<item>
<when>20130601003203</when>
<UnitName>GBMC</UnitName>
<room>XXYZ</room>
<bed>110</bed>
</item>
<item>
<when>20130602130600</when>
<UnitName>Sanai</UnitName>
<room>ABC</room>
<bed/>
</item>
<item>
<when>20130602150003</when>
<UnitName>John Hopkins</UnitName>
<room>J615</room>
<bed>A</bed>
</item>
</Location>
Your source message is in a string form, you need to create a parser that uses regex to split the message based on first pipes and then carat. refer to Unable to parse ^ character which has my original code for the parser and the solution gives a different approach to it.
After you have individual elements you need to add it to your xml as nodes.
I have following xml which contains several xml tags with xsi:nil="true". These are tags that are basically null. I am not able to use/find any sxlt transformer to remove these tags from the xml and obtain the rest of the xml.
<?xml version="1.0" encoding="utf-8"?>
<p849:retrieveAllValues xmlns:p849="http://package.de.bc.a">
<retrieveAllValues>
<messages xsi:nil="true" />
<existingValues>
<Values>
<value1> 10.00</value1>
<value2>123456</value2>
<value3>1234</value3>
<value4 xsi:nil="true" />
<value5 />
</Values>
</existingValues>
<otherValues xsi:nil="true" />
<recValues xsi:nil="true" />
</retrieveAllValues>
</p849:retrieveAllValues>
The reason of error you get
[Fatal Error] file2.xml:5:30: The prefix "xsi" for attribute "xsi:nil" associated with an element type "messages" is not bound.
is absence of prefix named "xsi" declared, you should specify it in root element such as:
<p849:retrieveAllValues xmlns:p849="http://package.de.bc.a"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<retrieveAllValues>
<messages xsi:nil="true" />
// other code...
update
If you could not change xml document you're receiving from webservice, you could try next approach(if this approach is acceptable for you):
Change your xslt document to process xml documents without specifying element prefixes
Set property namespaceAware of DocumentBuilderFactory to false
After this yout transformer shouldn't complain
It doesn't look like this is going to be possible in XSLT - because of the missing namespace declarations you have to parse the XML file with a non-namespace-aware parser, but all the XSLT processors I've tried don't get on well with such documents, they must rely on some information that is only present when parsing with namespace awareness enabled, even if the document in question doesn't actually contain any namespaced nodes.
So you'll have to approach it a different way, for example by traversing the DOM tree yourself. Since you say you're working in Java, here's an example using Java DOM APIs (the example runs as-is in the Groovy console, or wrap it up in a proper class definition and add whatever exception handling is required to run it as Java)
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.w3c.dom.ls.*;
public void stripNils(Node n) {
if(n instanceof Element &&
"true".equals(((Element)n).getAttribute("xsi:nil"))) {
// element is xsi:nil - strip it out
n.getParentNode().removeChild(n);
} else {
// we're keeping this node, process its children (if any) recursively
NodeList children = n.getChildNodes();
for(int i = 0; i < children.getLength(); i++) {
stripNils(children.item(i));
}
}
}
// load the document (NB DBF is non-namespace-aware by default)
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document xmlDoc = db.parse(new File("input.xml"));
stripNils(xmlDoc);
// write out the modified document, in this example to stdout
LSSerializer ser =
((DOMImplementationLS)xmlDoc.getImplementation()).createLSSerializer();
LSOutput out =
((DOMImplementationLS)xmlDoc.getImplementation()).createLSOutput();
out.setByteStream(System.out);
ser.write(xmlDoc, out);
On your original example XML this produces the correct result:
<?xml version="1.0" encoding="UTF-8"?>
<p849:retrieveAllValues xmlns:p849="http://package.de.bc.a">
<retrieveAllValues>
<existingValues>
<Values>
<value1> 10.00</value1>
<value2>123456</value2>
<value3>1234</value3>
<value5/>
</Values>
</existingValues>
</retrieveAllValues>
</p849:retrieveAllValues>
The empty lines are not actually empty, they contain the whitespace text nodes either side of the removed elements, as only the elements themselves are being removed here.
I can't get this working for the life of me. Here is a snippet of the xml I get from an RSS feed from itunes affiliate. I want top print the values within tags but I cannot for some reason:
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns:im="http://itunes.apple.com/rss" xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
<id>http://ax.itunes.apple.com/WebObjects/MZStoreServices.woa/ws/RSS/toppaidapplications/sf=143441/limit=100/genre=6014/xml</id><title>iTunes Store: Top Paid Applications</title><updated>2010-03-24T15:36:42-07:00</updated><link rel="alternate" type="text/html" href="http://itunes.apple.com/WebObjects/MZStore.woa/wa/viewTop?id=25180&popId=30"/><link rel="self" href="http://ax.itunes.apple.com/WebObjects/MZStoreServices.woa/ws/RSS/toppaidapplications/sf=143441/limit=100/genre=6014/xml"/><icon>http://phobos.apple.com/favicon.ico</icon><author><name>iTunes Store</name><uri>http://www.apple.com/itunes/</uri></author><rights>Copyright 2008 Apple Inc.</rights>
<entry>
<updated>date</updated>
<id>someID</id>
<title>a</title>
<im:name>b</im:name>
</entry>
<entry>
<updated>date2/updated>
<id>someID2</id>
<title>a2</title>
<im:name>b2</im:name>
</entry>
</feed>
If I try <xsl:apply-templates match="entry"/> it spits out the entire contents of file. If I use <xsl:call-template name="entry"> it will show only one entry and I have to use <xsl:value-of select="//*[local-name(.)='name']"/> to get name but that's a hack. I've used xslt before for xml without namespaces and xml that has proper parent child relationships but not like this RSS feed. Notice entry is not wrapped in entries or anything.
Any help is appreciated. I want to use xslt because I want to alter the itunes link to go through my affiliate account - so something automated wouldn't work for me.
You are matching elements that are in no namespace, but the actual elements in the XML document do belong to a (deafult) namspace: xmlns="http://www.w3.org/2005/Atom".
Therefore, you need to declare the namespace in your stylesheet, let's say xmlns:atom="http://www.w3.org/2005/Atom". and then match not just on {elementName} but on {atom:elementName}, where {elementName} in your case is: "entry".
I am working on an automated testing app, and am currently in the process of writing a function that compares values between two XML files that should be identical, but may not be. Here is a sample of the XML I'm trying to process:
<?xml version="1.0" encoding="utf-8"?>
<report xmlns="http://www.**.com/**">
<subreport name="RBDReport">
<record rowNumber="1">
<field name="Time">
<value>0</value>
</field>
<field name="Reliability">
<value>1.000000</value>
</field>
<field name="Unreliability">
<value>0.000000</value>
</field>
<field name="Availability">
<value> </value>
</field>
<field name="Unavailability">
<value> </value>
</field>
<field name="Failure Rate">
<value>N/A</value>
</field>
<field name="Number of Failures">
<value> </value>
</field>
<field name="Total Downtime">
<value> </value>
</field>
</record>
(Note there may be multiple <subreport> elements and within those, multiple <record> elements.)
What I'd like is to extract the <value> tags of two documents and then compare their values. That part I know how to do. The problem is the extraction itself.
Since I'm stuck in C++, I'm using MSXML, and have written a wrapper to allow my app to abstract away the actual XML manipulation, in case I ever decide to change my data format.
That wrapper, CSimpleXMLParser, loads an XML document and sets its "top record" to the document element of the XML document. (CRecord being an abstract class with CXMLRecord one of its subclasses, and which gives access to child records singularly or by group, and also allowing access to the "value" of the Record (values for child elements or attributes, in the case of CXMLRecord.) A CXMLRecord contains an MSXML::MSXMLDOMNodePtr and a pointer to an instance of a CSimpleXMLParser.) The wrapper also contains utility functions for returning children, which the CXMLRecord uses to return its child records.
In my code, I do the following (trying to return all <subreport> nodes just to see if it works):
CSimpleXMLParser parserReportData;
parserReportData.OpenXMLDocument(strPathToXML);
bool bGetChildrenSuccess = parserReportData.GetFirstRecord()->GetChildRecords(listpChildren, _T("subreport"));
This is always returning false. The meat of the implementation of CXMLRecord::GetChildRecords() is basically
MSXML2::IXMLDOMNodeListPtr pListChildren = m_pParser->SelectNodes(strPath, m_pXMLNode);
if (pListChildren->Getlength() == 0)
{
return false;
}
for (long l = 0; l < pListChildren->Getlength(); ++l)
{
listRecords.push_back(new CXMLRecord(pListChildren->Getitem(l), m_pParser));
}
return true;
And CSimpleXMLParser::SelectNodes() is:
MSXML2::IXMLDOMNodeListPtr CSimpleXMLParser::SelectNodes(LPCTSTR strXPathFilter, MSXML2::IXMLDOMNodePtr pXMLNode)
{
return pXMLNode->selectNodes(_bstr_t(strXPathFilter));
}
When run, the top record is definitely being set to the <report> element properly. I can do all sorts of things with it, like getting its child nodes (through the MSXML interface, not through my wrapper) or its name, etc. I know that my wrapper can work, because I use it elsewhere in the app for parsing an XML configuration file, and that works flawlessly.
I thought maybe I was doing something faulty with the XPath query expression, but every permutation I could think of gives no joy. The MSXML::IXMLDOMNodeListPtr returned by IXMLDOMNodePtr::SelectNodes() is always of length 0 when I try to deal with this XML file.
This is driving me crazy.
I'm used to doing this with .NET's XmlDocument objects, but I think the effect is the same here:
If the XML document includes a namespace -- even an unnamed one -- then the Xpath query has to use one as well. So, you'll have to add the namespace to the XMLDoument which you might as well give a name in the code, and the include the prefix in the XPATH query (it doesn't matter that the prefixes are different between the xml document and the xpath, as long as the namespaces sort it out)
SO, while you are using an XPath like /report/subreport/record/field/value, you actually need to first set the namespace of your document:
pXMLDoc->setProperty(_bstr_t("SelectionNamespaces"),
_bstr_t("xmlns:r="http://www.**.com/**"));
and then selectNodes() using /r:report/r:subreport/r:record/r:field/r:value
I see no reference to a namespace when you're selecting nodes. I'd expect this to be the fundamental problem.