Find Text Node with largest number of elements preceeding - xslt

With XML:
<?xml version="1.0" encoding="UTF-8"?>
<Root>
<A>
<B>
<C>
<Name>Bob</Name>
</C>
<D>
<Operation>Yes</Operation>
<E>
<Operation>No</Operation>
</E>
</D>
</B>
</A>
</Root>
I have XSLT that produces text output:
/Root/A/B/C/Name
/Root/A/B/D/Operation
/Root/A/B/D/E/Operation
Problem:
The deepest text node is /Root/A/B/D/E/Operation.
I'd like to be able to arrive at the number 5 (the text node with the largest / max number of element levels in front, prior to producing the output above.
So it should work for any XML document. Element names are unknown.

The XSLT 3 stylesheet
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:value-of select="let $leaf-elements := //*[not(*)],
$max-anc := max($leaf-elements/count(ancestor::*))
return ($max-anc, $leaf-elements!string-join(ancestor-or-self::*/name(), '/'))" separator="
"/>
</xsl:template>
</xsl:stylesheet>
run against your sample input outputs
5
Root/A/B/C/Name
Root/A/B/D/Operation
Root/A/B/D/E/Operation
Online sample https://xsltfiddle.liberty-development.net/6qVRKwh.

Related

generate-id() too slow for large document

I have a large xml document containing annotated speech transcripts. Following is a short fragment.
<?xml version="1.0" encoding="UTF-8"?>
<U>
<A/>
<C type="start" id="cb01s"/>
<P/>
<T>a</T>
<T>woman</T>
<P/>
<T>took</T>
<T>off</T>
<T>the</T>
<T>train</T>
<C type="end" id="cb02e"/>
<P/>
<T>but</T>
<P/>
<F/>
<RT>
<O>
<C type="start" id="cb03s"/>
<T>her</T>
<T>bag</T>
<P/>
<T>are</T>
</O>
<P/>
<E>
<C type="start" id="cb04s"/>
<T>her</T>
<T>bag</T>
<T>are</T>
</E>
</RT>
<P/>
<T>still</T>
<P/>
<T>in</T>
<T>the</T>
<T>train</T>
<C type="end" id="cb05e"/>
<PC>.</PC>
</U>
The basic task I need to do is to get the number of <T> nodes between certain pairs of <C> nodes. I've used the following stylesheet fragment to do this (illustrating with one specific pair of <C> nodes).
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="U">
<xsl:variable name="start-node" select="descendant::C[#id = 'cb01s']"/>
<xsl:variable name="end-node" select="descendant::C[#id = 'cb02e']"/>
<xsl:text>Result: </xsl:text>
<xsl:value-of select="count($start-node/following::T[following::C[generate-id(.) = generate-id($end-node)]])"/>
</xsl:template>
</xsl:stylesheet>
This works fine on such a short XML fragment as above and gives the correct result: Result: 6.
However, the actual XML document contains tens of thousands of <C> nodes and even more <T> nodes. So when I try to run the stylesheet on it the result comes back very slowly. (It would probably take days to finish completely.) I suppose the problem must be that on each run of the <xsl:value-of... line, the processor (Saxon) is checking all <T> nodes and generating id's for <C> nodes multiples times (i.e., exponentially) and that slows everything down.
Is there a way to speed up the process while still using generate-id()? Or do I need to get the number of <T> nodes with some alternate approach?
You do not need generate-id() just to avoid matching <C> elements intervening between the start and end nodes. You are matching <C> elements by their id attributes in the first place, and I see no reason not to use that more directly. For example,
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="U">
<xsl:variable name="start-id" select="cb01s"/>
<xsl:variable name="end-id" select="cb02e"/>
<xsl:text>Result: </xsl:text>
<xsl:value-of select="count(descendant::C[#id = $start-id]/following::T[following::C[#id = $end-id][1]])"/>
</xsl:template>
</xsl:stylesheet>
You can simplify that by removing the [1] position predicate if you can rely on the <C> element #ids to be unique in the document.
If generate-id() is indeed the primary cause of your performance problem, then avoiding it altogether ought to provide a big boost.

XSL check multiple nodes exist with for-each

If I have multiple nodes in an xsl document and want to check that they all have a child node that exists, how would you do that with a for-each loop in XSL 2?
<A>
<B>
<C>test</C>
</B>
<B>
<C>test</C>
</B>
</A>
For example in the document above, we want to iterate through all B Nodes in the document and ascertain if C exists with the value 'test' for that B node.
"we want to iterate through all B Nodes in the document and ascertain if C exists with the value 'test' for that B node"
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:for-each select="A/B[C='test']">
<!-- Rest of XSLT -->
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
You can add 'tests'/predicates using [].

How to select unique child nodes of all siblings in XSLT 1

I'm looking for the best way to get all unique (no duplicates) nested nodes of all sibling nodes. The node I'm am interested in is "Gases". The sibling nodes are "Content". My simplified XML:
<Collection>
<Content>
<Html>
<root>
<Gases>NO2</Gases>
<Gases>CH4</Gases>
<Gases>O2</Gases>
</root>
</Html>
</Content>
<Content>
<Html>
<root>
<Gases>NO2</Gases>
<Gases>CH4</Gases>
<Gases>CO</Gases>
<Gases>LEL</Gases>
<Gases>NH3</Gases>
</root>
</Html>
</Content>
</Collection>
Desired result: NO2 CH4 O2 CO LEL NH3
I'm new to XSLT so any help would be much appreciated. I've been trying to use XPATH, similar to here, but with no luck.
This XSLT stylesheet will produce the desired output. Note that it relies on there being no duplicate Gases element inside a single Content element.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<!-- Match Gases elements whose value does not appear in a Gases element inside a previous
Content element. -->
<xsl:template match="//Gases[not(. = ancestor::Content/preceding-sibling::Content//Gases)]">
<xsl:value-of select="."/>
<xsl:text> </xsl:text>
</xsl:template>
<!-- Need to override the built-in template for text nodes, otherwise they will still get
printed out. -->
<xsl:template match="text()"/>
</xsl:stylesheet>

XPATH selecting entire tree including only the first

Given the following structure, in XPATH, I want to select the entire tree but only include the first date thus excluding all of the other dates. The number of dates after the first date is not constant. Any ideas? My apologies is the format isn't correct.
<A>
<B>
<DATE>04272011</DATE>
<C>
<D>
<DATE>02022011</DATE>
</D>
<D>
<DATE>03142011</DATE>
</D>
</C>
</B>
</A>
My appologies.
A better example
<NOTICES>
<SNOTE>
<DATE>01272011</DATE>
<ZIP>35807</ZIP>
<CLASSCOD>A</CLASSCOD>
<EMAIL>
<ADDRESS>address 1</ADDRESS>
</EMAIL>
<CHANGES>
<MOD>
<DATE>02022011</DATE>
<MODNUM>12345</MODNUM>
<EMAIL>
<ADDRESS>address 2</ADDRESS>
</EMAIL>
</MOD>
<MOD>
<DATE>03022011</DATE>
<MODNUM>56789</MODNUM>
<EMAIL>
<ADDRESS>address 3</ADDRESS>
</EMAIL>
</MOD>
</CHANGES>
</SNOTE>
</NOTICES>
I'm breaking up one large xml file into individual XML files. My original XPATH statement is
/NOTICES/SNOTE
Each individual xml file looks fine except it pulls in all of the dates: This is my desired output.
<SNOTE>
<DATE>01272011</DATE>
<ZIP>35807</ZIP>
<CLASSCOD>A</CLASSCOD>
<EMAIL>
<ADDRESS>address 1</ADDRESS>
</EMAIL>
<CHANGES>
<MOD>
<MODNUM>12345</MODNUM>
<EMAIL>
<ADDRESS>address 2</ADDRESS>
</EMAIL>
</MOD>
<MOD>
<MODNUM>56789</MODNUM>
<EMAIL>
<ADDRESS>address 3</ADDRESS>
</EMAIL>
</MOD>
</CHANGES>
</SNOTE>
XPath is a query language for XML documents and as such it cannot alter the structure of the document (such as insert/delete/rename nodes).
What you need is an XSLT transformation -- as simple as this:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="DATE[preceding::DATE]"/>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<A>
<B>
<DATE>04272011</DATE>
<C>
<D>
<DATE>02022011</DATE>
</D>
<D>
<DATE>03142011</DATE>
</D>
</C>
</B>
</A>
the wanted, correct result is produced:
<A>
<B>
<DATE>04272011</DATE>
<C>
<D/>
<D/>
</C>
</B>
</A>
If by "select the entire tree" you mean "select the set of all the nodes in the tree" (except the non-first DATE elements), that can be done:
"//node()[not(self::DATE) or not(preceding::DATE)]"
Then, the non-first <DATE> element nodes will not themselves be in the selected nodeset, but nodes in the selected nodeset (such as the root node, or <D>) will still have <DATE> descendants.
If instead you want to select the tree (i.e. the root node), or rather a modified version of it, such that <D> elements do not have any <DATE> children, then that requires modification of the tree. XPath can't modify XML trees by itself. You need an XML transformation technology, such as XSLT or an XML DOM library.

Apply template on preceding-sibling of following sibling of current node

Hi I have following Input
<Root>
<A rename="yes"/>
<B rename="no"/>
<C rename="no"/>
<D rename="yes"/>
<E rename="no"/>
<F all="yes"/>
</Root>
Currently i am at <A> and i want apply template on the element whose #rename="yes", that is coming before element <F>.
i am trying to doing something like:
<xsl:apply-templates select=
"following-sibling::*[#all='yes']/preceding-sibling::node()[#rename='yes'" />
But i am not getting the expected output. Please suggest.
Currently i am at <A> and i want
apply template on the element whose
#rename="yes", that is coming before
element <F>
You want this XPath expression (assuming A has only one following sibling named F):
following-sibling::F/preceding-sibling::*[#rename='yes'][1]
It selects any element whose rename attribute has value of "yes" and that is the first such preceding sibling of any following sibling (of the current node) element named F.
Here is a complete XSLT transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="A">
<xsl:apply-templates mode="found" select=
"following-sibling::F/preceding-sibling::*[#rename='yes'][1]"/>
</xsl:template>
<xsl:template match="*" mode="found">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
when applied on the provided XML document:
<Root>
<A rename="yes"/>
<B rename="no"/>
<C rename="no"/>
<D rename="yes"/>
<E rename="no"/>
<F all="yes"/>
</Root>
the wanted, correct result is produced:
<D rename="yes"/>