XPATH selecting entire tree including only the first - xslt

Given the following structure, in XPATH, I want to select the entire tree but only include the first date thus excluding all of the other dates. The number of dates after the first date is not constant. Any ideas? My apologies is the format isn't correct.
<A>
<B>
<DATE>04272011</DATE>
<C>
<D>
<DATE>02022011</DATE>
</D>
<D>
<DATE>03142011</DATE>
</D>
</C>
</B>
</A>
My appologies.
A better example
<NOTICES>
<SNOTE>
<DATE>01272011</DATE>
<ZIP>35807</ZIP>
<CLASSCOD>A</CLASSCOD>
<EMAIL>
<ADDRESS>address 1</ADDRESS>
</EMAIL>
<CHANGES>
<MOD>
<DATE>02022011</DATE>
<MODNUM>12345</MODNUM>
<EMAIL>
<ADDRESS>address 2</ADDRESS>
</EMAIL>
</MOD>
<MOD>
<DATE>03022011</DATE>
<MODNUM>56789</MODNUM>
<EMAIL>
<ADDRESS>address 3</ADDRESS>
</EMAIL>
</MOD>
</CHANGES>
</SNOTE>
</NOTICES>
I'm breaking up one large xml file into individual XML files. My original XPATH statement is
/NOTICES/SNOTE
Each individual xml file looks fine except it pulls in all of the dates: This is my desired output.
<SNOTE>
<DATE>01272011</DATE>
<ZIP>35807</ZIP>
<CLASSCOD>A</CLASSCOD>
<EMAIL>
<ADDRESS>address 1</ADDRESS>
</EMAIL>
<CHANGES>
<MOD>
<MODNUM>12345</MODNUM>
<EMAIL>
<ADDRESS>address 2</ADDRESS>
</EMAIL>
</MOD>
<MOD>
<MODNUM>56789</MODNUM>
<EMAIL>
<ADDRESS>address 3</ADDRESS>
</EMAIL>
</MOD>
</CHANGES>
</SNOTE>

XPath is a query language for XML documents and as such it cannot alter the structure of the document (such as insert/delete/rename nodes).
What you need is an XSLT transformation -- as simple as this:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="DATE[preceding::DATE]"/>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<A>
<B>
<DATE>04272011</DATE>
<C>
<D>
<DATE>02022011</DATE>
</D>
<D>
<DATE>03142011</DATE>
</D>
</C>
</B>
</A>
the wanted, correct result is produced:
<A>
<B>
<DATE>04272011</DATE>
<C>
<D/>
<D/>
</C>
</B>
</A>

If by "select the entire tree" you mean "select the set of all the nodes in the tree" (except the non-first DATE elements), that can be done:
"//node()[not(self::DATE) or not(preceding::DATE)]"
Then, the non-first <DATE> element nodes will not themselves be in the selected nodeset, but nodes in the selected nodeset (such as the root node, or <D>) will still have <DATE> descendants.
If instead you want to select the tree (i.e. the root node), or rather a modified version of it, such that <D> elements do not have any <DATE> children, then that requires modification of the tree. XPath can't modify XML trees by itself. You need an XML transformation technology, such as XSLT or an XML DOM library.

Related

How to find a space between elements in xslt

I want to remove the additional space in the 4th paragraph. It was introduced by the XSLT which I have written.
Given below the XML and XSLT output.
<Chapter>
<para>A <emphasis>B</emphasis> <emphasis>C</emphasis> <emphasis>N</emphasis> D</para>
<para>A <emphasis>B</emphasis> <emphasis>C</emphasis> <emphasis>N</emphasis> D</para>
<para>A <emphasis>B</emphasis> <emphasis>C</emphasis> <emphasis>N</emphasis> D</para>
<para>A <emphasis>B</emphasis><emphasis>C</emphasis> <emphasis>N</emphasis> D</para>
</Chapter>
Given the XSLT code below:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="para" />
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="para">
<p><xsl:apply-templates/></p>
</xsl:template>
<xsl:template match="emphasis">
<b><xsl:apply-templates/></b>
<xsl:if test="self::emphasis/following::text()[starts-with(., ' ')] and following-sibling::node()[1][self::emphasis]"><xsl:text> </xsl:text></xsl:if>
</xsl:template>
</xsl:stylesheet>
Expected output:
<p>A <b>B</b> <b>C</b> <b>N</b> D</p>
<p>A <b>B</b> <b>C</b> <b>N</b> D</p>
<p>A <b>B</b> <b>C</b> <b>N</b> D</p>
<p>A <b>B</b><b>C</b> <b>N</b> D</p>
I know the problem is with the strip-space given in the XSLT, that's why the spaces removed between those elements. I have tried to overcome the stip-space and given space between those elements. Still, there is an additional space that has been added in the 4th paragraph after B. Anyone know how to remove that unwanted space or any solution present to give the space between those elements.

Find Text Node with largest number of elements preceeding

With XML:
<?xml version="1.0" encoding="UTF-8"?>
<Root>
<A>
<B>
<C>
<Name>Bob</Name>
</C>
<D>
<Operation>Yes</Operation>
<E>
<Operation>No</Operation>
</E>
</D>
</B>
</A>
</Root>
I have XSLT that produces text output:
/Root/A/B/C/Name
/Root/A/B/D/Operation
/Root/A/B/D/E/Operation
Problem:
The deepest text node is /Root/A/B/D/E/Operation.
I'd like to be able to arrive at the number 5 (the text node with the largest / max number of element levels in front, prior to producing the output above.
So it should work for any XML document. Element names are unknown.
The XSLT 3 stylesheet
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:value-of select="let $leaf-elements := //*[not(*)],
$max-anc := max($leaf-elements/count(ancestor::*))
return ($max-anc, $leaf-elements!string-join(ancestor-or-self::*/name(), '/'))" separator="
"/>
</xsl:template>
</xsl:stylesheet>
run against your sample input outputs
5
Root/A/B/C/Name
Root/A/B/D/Operation
Root/A/B/D/E/Operation
Online sample https://xsltfiddle.liberty-development.net/6qVRKwh.

XSLT insert at right location

My input XML is as follows. Basically the XML has various <servlet> tags. My requirement is to apply a XSLT transform which browses through <servlet-name> tags and see if a particular servlet with specified name exists. If it exists then i need to see that tag <B> under that particular servlet with a <param-name>does not EXIST. If the tag with a specific <param-name> under the searched doesn't exist then i add the tag <B>NEW</B> along with other <B> tags of that particular servlet else i do not perform any action.
INPUT XML
<web-app metadata-complete="true">
<servlet>
<servlet-name>AAA</servlet-name>
<servlet-class>com.AAA</servlet-class>
<B>
<param-name>port</param-name>
<param-value>8802</param-value>
</B>
<B>
<param-name>connectors-xml</param-name>
<param-value/>
</B>
<B>
<param-name>webservices-xml</param-name>
<param-value/>
</B>
<B>
<param-name>exposure-server</param-name>
<param-value/>
</B>
<some-tag>1</some-tag>
</servlet>
<servlet>
<servlet-name>BBB</servlet-name>
<servlet-class>com.BBB</servlet-class>
<B>
<param-name>port</param-name>
<param-value>8802</param-value>
</B>
<B>
<param-name>connectors-xml</param-name>
<param-value/>
</B>
<B>
<param-name>webservices-xml</param-name>
<param-value/>
</B>
<B>
<param-name>exposure-server</param-name>
<param-value/>
</B>
<some-tag>2</some-tag>
</servlet>
<C>
<D>
</D
</C>
<junk-tag>
<tag1>BASIC</tag1>
<tag2>BASIC</tag2>
</junk-tag>
</web-app>
eg. Lets say i search for a Servlet with Servlet name as "BBB" below. If found, then i check that its <B> tag with <param-name> value XXX doesn't exist then i add it so that o/p looks as below. If "BBB" <servlet-name> has <B> tag with <param-name> value XXX already present then i do not do anything.
OUTPUT.XML
<web-app metadata-complete="true">
<servlet>
<servlet-name>AAA</servlet-name>
<servlet-class>com.AAA</servlet-class>
<B>
<param-name>port</param-name>
<param-value>8802</param-value>
</B>
<B>
<param-name>connectors-xml</param-name>
<param-value/>
</B>
<B>
<param-name>webservices-xml</param-name>
<param-value/>
</B>
<B>
<param-name>exposure-server</param-name>
<param-value/>
</B>
<some-tag>1</some-tag>
</servlet>
<servlet>
<servlet-name>BBB</servlet-name>
<servlet-class>com.BBB</servlet-class>
<B>
<param-name>port</param-name>
<param-value>8802</param-value>
</B>
<B>
<param-name>connectors-xml</param-name>
<param-value/>
</B>
<B>
<param-name>webservices-xml</param-name>
<param-value/>
</B>
<B>
<param-name>exposure-server</param-name>
<param-value/>
</B>
<B>NEW</B>
<some-tag>2</some-tag>
</servlet>
<C>
<D>
</D
</C>
<junk-tag>
<tag1>BASIC</tag1>
<tag2>BASIC</tag2>
</junk-tag>
</web-app>
I have tried writing a XSLT but somehow caught in BUGS and syntax issues
<xsl:template match="web-app/servlet[servlet-name='BBB/B']">
<xsl:copy-of select="."/>
<xsl:choose>
<xsl:when test="not(/web-app/servlet[servlet-name='BBB']/B[param-name='XXX'])">
<B>NEW</B>
</xsl:when>
</xsl:choose>
</xsl:template>
Any guidance? I am NOVICE to XSLT and attempting by googling itself.
The expression you want is just not(B[param-name='XXX']) as you are already positioned on the relevant servlet at the point, so this expression will be relevant to that. Additionally your current code will copy the existing servlet and add <B>NEW</B> after it, when in fact you want it as a child.
So you could do this....
<xsl:template match="web-app/servlet[servlet-name='BBB']">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
<xsl:if test="not(B[param-name='XXX'])">
<B>NEW</B>
</xsl:if>
</xsl:copy>
</xsl:template>
Or better still, put the check in the template match itself
<xsl:template match="web-app/servlet[servlet-name='BBB'][not(B[param-name='XXX'])]">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
<B>NEW</B>
</xsl:copy>
</xsl:template>
(Both of these assume you are also using the identity template)
However, this adds your new tag after <some-tag>2</some-tag> which may not be what you want.
If you want to place it after the last B element, you should change the template to match the last B element instead.
<xsl:template match="web-app/servlet[servlet-name='BBB'][not(B[param-name='XXX'])]/B[last()]">
<xsl:copy-of select="." />
<B>NEW</B>
</xsl:template>
Try this XSLT
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="xml" indent="yes" />
<xsl:template match="web-app/servlet[servlet-name='BBB'][not(B[param-name='XXX'])]/B[last()]">
<xsl:copy-of select="." />
<B>NEW</B>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

generate-id() too slow for large document

I have a large xml document containing annotated speech transcripts. Following is a short fragment.
<?xml version="1.0" encoding="UTF-8"?>
<U>
<A/>
<C type="start" id="cb01s"/>
<P/>
<T>a</T>
<T>woman</T>
<P/>
<T>took</T>
<T>off</T>
<T>the</T>
<T>train</T>
<C type="end" id="cb02e"/>
<P/>
<T>but</T>
<P/>
<F/>
<RT>
<O>
<C type="start" id="cb03s"/>
<T>her</T>
<T>bag</T>
<P/>
<T>are</T>
</O>
<P/>
<E>
<C type="start" id="cb04s"/>
<T>her</T>
<T>bag</T>
<T>are</T>
</E>
</RT>
<P/>
<T>still</T>
<P/>
<T>in</T>
<T>the</T>
<T>train</T>
<C type="end" id="cb05e"/>
<PC>.</PC>
</U>
The basic task I need to do is to get the number of <T> nodes between certain pairs of <C> nodes. I've used the following stylesheet fragment to do this (illustrating with one specific pair of <C> nodes).
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="U">
<xsl:variable name="start-node" select="descendant::C[#id = 'cb01s']"/>
<xsl:variable name="end-node" select="descendant::C[#id = 'cb02e']"/>
<xsl:text>Result: </xsl:text>
<xsl:value-of select="count($start-node/following::T[following::C[generate-id(.) = generate-id($end-node)]])"/>
</xsl:template>
</xsl:stylesheet>
This works fine on such a short XML fragment as above and gives the correct result: Result: 6.
However, the actual XML document contains tens of thousands of <C> nodes and even more <T> nodes. So when I try to run the stylesheet on it the result comes back very slowly. (It would probably take days to finish completely.) I suppose the problem must be that on each run of the <xsl:value-of... line, the processor (Saxon) is checking all <T> nodes and generating id's for <C> nodes multiples times (i.e., exponentially) and that slows everything down.
Is there a way to speed up the process while still using generate-id()? Or do I need to get the number of <T> nodes with some alternate approach?
You do not need generate-id() just to avoid matching <C> elements intervening between the start and end nodes. You are matching <C> elements by their id attributes in the first place, and I see no reason not to use that more directly. For example,
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="U">
<xsl:variable name="start-id" select="cb01s"/>
<xsl:variable name="end-id" select="cb02e"/>
<xsl:text>Result: </xsl:text>
<xsl:value-of select="count(descendant::C[#id = $start-id]/following::T[following::C[#id = $end-id][1]])"/>
</xsl:template>
</xsl:stylesheet>
You can simplify that by removing the [1] position predicate if you can rely on the <C> element #ids to be unique in the document.
If generate-id() is indeed the primary cause of your performance problem, then avoiding it altogether ought to provide a big boost.

XML to parse eCFR - again

Once more into the breach.
I am looking to derive ordered pairs of information from an XML source for use in a lookup table in a database. The XML is very flat as its structure relates instructions for typesetting the documents. Data is not differentiated except by its format in this XML. A sample of the XML is as follows:
<APPENDIX>
<EAR>Pt. 774, Supp. 1</EAR>
<HD SOURCE="HED">Supplement No. 1 to Part 774—The Commerce Control List</HD>
<HD SOURCE="HD1">Category 0—Nuclear Materials, Facilities, and Equipment [and Miscellaneous Items]</HD>
<HD SOURCE="HD1">A. “End Items,” “Equipment,” “Accessories,” “Attachments,” “Parts,” “Components,” and “Systems”</HD>
<FP SOURCE="FP-2">
<E T="02">0A002Power generating or propulsion equipment “specially designed” for use with space, marine or mobile “nuclear reactors”. (These items are “subject to the ITAR.” See 22 CFR parts 120 through 130.)</E>
</FP>
<FP SOURCE="FP-2">
<E T="02">0A018Items on the Wassenaar Munitions List (see List of Items Controlled).</E>
</FP>
<FP SOURCE="FP-1">
<E T="04">License Requirements</E>
</FP>
<FP SOURCE="FP-1">
<E T="03">Reason for Control:</E> NS, AT, UN</FP>
<GPOTABLE CDEF="s50,r50" COLS="2" OPTS="L2">
<BOXHD>
<CHED H="1">Control(s)</CHED>
<CHED H="1">Country Chart (See Supp. No. 1 to part 738)</CHED>
</BOXHD>
<ROW>
<ENT I="01">NS applies to entire entry</ENT>
<ENT>NS Column 1.</ENT>
</ROW>
<ROW>
<ENT I="01">AT applies to entire entry</ENT>
<ENT>AT Column 1.</ENT>
</ROW>
<ROW>
<ENT I="01">UN applies to entire entry</ENT>
<ENT>See § 746.1(b) for UN controls.</ENT>
</ROW>
</GPOTABLE>
<FP SOURCE="FP-1">
<E T="05">List Based License Exceptions (See Part 740 for a description of all license exceptions)</E>
</FP>
<FP SOURCE="FP-1">
<E T="03">LVS:</E> $3,000 for 0A018.b</FP>
<FP SOURCE="FP-1">$1,500 for 0A018.c and .d</FP>
<FP SOURCE="FP-1">
<E T="03">GBS:</E> N/A</FP>
<FP SOURCE="FP-1">
<E T="03">CIV:</E> N/A</FP>
<FP SOURCE="FP-1">
<E T="04">List of Items Controlled</E>
</FP>
<FP SOURCE="FP-1">
<E T="03">Related Controls:</E> (1) See also 0A979, 0A988, and 22 CFR 121.1 Categories I(a), III(b-d), and X(a). (2) See ECCN 0A617.y.1 and .y.2 for items formerly controlled by ECCN 0A018.a. (3) See ECCN 1A613.c for military helmets providing less than NIJ Type IV protection and ECCN 1A613.y.1 for conventional military steel helmets that, immediately prior to July 1, 2014, were classified under 0A018.d and 0A988. (4) See 22 CFR 121.1 Category X(a)(5) and (a)(6) for controls on other military helmets.</FP>
<FP SOURCE="FP-1">
<E T="03">Related Definitions:</E> N/A</FP>
<FP>
<E T="03">Items:</E> a. [Reserved]</FP>
<P>b. “Specially designed” components and parts for ammunition, except cartridge cases, powder bags, bullets, jackets, cores, shells, projectiles, boosters, fuses and components, primers, and other detonating devices and ammunition belting and linking machines (all of which are “subject to the ITAR.” (See 22 CFR parts 120 through 130);</P>
<NOTE>
<HD SOURCE="HED">
<E T="03">Note:</E>
</HD>
<P>
<E T="03">0A018.b does not apply to “components” “specially designed” for blank or dummy ammunition as follows:</E>
</P>
<P>
<E T="03">a. Ammunition crimped without a projectile (blank star);</E>
</P>
</APPENDIX>
Also attached are two XSL samples. The first will obtain the ECCN numbers from the nodes FP/E where the attributes are "FP-2" and "02", respectively. The second uses an xsl:if statement to obtain the "Reasons for Control" also from node FP. In this latter case the IF statement is used to determine whether the E node within the FP node includes the "Reason/s for Control" text.
<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="//FP[#SOURCE = 'FP-2']/E[#T='02']">
<xsl:value-of select="."/>\n
</xsl:for-each>
</xsl:template>
</xsl:stylesheet
<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="//FP[#SOURCE = 'FP-1']">
<xsl:if test= "E='Reason for Control:' or E='Reasons for Control:'">
<xsl:value-of select="."/>\n
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
The output that I need is an ordered pair of the preceding ECCN and Reasons for Control information. My thought would be that if one were to move down the list to each FP node and perform a test on its attributes, keeping the correct ones as suggested above by the XSL samples, I should get a 1D list of the necessary information with an ECCN followed by its matching Reasons for Control, if any. However, I get most of the text of the original XML with a whole lot of "Nothing" thrown in. In other words, I am apparently matching the FP nodes, but the 'when' statements are not being satisfied for some reason.
<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="FP">
<xsl:choose>
<xsl:when test="#Source='FP-2'">
<xsl:value-of select="."/>\n
</xsl:when>
<xsl:when test="#Source='FP-1'">
<xsl:if test= "E='Reason for Control:' or E='Reasons for Control:'">
<xsl:value-of select="."/>\n
</xsl:if>
</xsl:when>
<xsl:otherwise>
Nothing
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
I believe that if I can obtain a 1D list as described above, that I would be able to later get this into a Filemaker database. Given these premises, can anyone offer any advice on how to proceed?
Here is what I think I understand from this very confusing description:
There are two types of nodes of interest here; the first one can be selected
by:
/APPENDIX/FP[#SOURCE='FP-2'][E[#T='02']]
and the second one by:
/APPENDIX/FP[#SOURCE='FP-1'][E[#T='03']='Reason for Control:']
These nodes are siblings.
The nodes of the second type are related to the first
preceding-sibling node of the first type; Not every node of the first
type has a related node of the second type.
Based on these assumptions, the following styleheet:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="k" match="FP[#SOURCE='FP-1'][E[#T='03']='Reason for Control:']" use="generate-id(preceding-sibling::FP[#SOURCE='FP-2'][E[#T='02']][1])" />
<xsl:template match="/APPENDIX">
<FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
<METADATA>
<FIELD NAME="ECCNFP_2" TYPE="TEXT"/>
<FIELD NAME="ECCNFP_1" TYPE="TEXT"/>
</METADATA>
<RESULTSET>
<xsl:for-each select="FP[#SOURCE='FP-2'][E[#T='02']]">
<ROW>
<COL><DATA><xsl:value-of select="substring(E[#T='02'], 1, 5)"/></DATA></COL>
<COL><DATA><xsl:value-of select="key('k', generate-id())/text()"/></DATA></COL>
</ROW>
</xsl:for-each>
</RESULTSET>
</FMPXMLRESULT>
</xsl:template>
</xsl:stylesheet>
when applied to your input example (after correcting the unclosed <NOTE> element!), will produce:
Result
<?xml version="1.0" encoding="UTF-8"?>
<FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
<METADATA>
<FIELD NAME="ECCNFP_2" TYPE="TEXT"/>
<FIELD NAME="ECCNFP_1" TYPE="TEXT"/>
</METADATA>
<RESULTSET>
<ROW>
<COL>
<DATA>0A002</DATA>
</COL>
<COL>
<DATA/>
</COL>
</ROW>
<ROW>
<COL>
<DATA>0A018</DATA>
</COL>
<COL>
<DATA> NS, AT, UN</DATA>
</COL>
</ROW>
</RESULTSET>
</FMPXMLRESULT>