generate-id() too slow for large document - xslt

I have a large xml document containing annotated speech transcripts. Following is a short fragment.
<?xml version="1.0" encoding="UTF-8"?>
<U>
<A/>
<C type="start" id="cb01s"/>
<P/>
<T>a</T>
<T>woman</T>
<P/>
<T>took</T>
<T>off</T>
<T>the</T>
<T>train</T>
<C type="end" id="cb02e"/>
<P/>
<T>but</T>
<P/>
<F/>
<RT>
<O>
<C type="start" id="cb03s"/>
<T>her</T>
<T>bag</T>
<P/>
<T>are</T>
</O>
<P/>
<E>
<C type="start" id="cb04s"/>
<T>her</T>
<T>bag</T>
<T>are</T>
</E>
</RT>
<P/>
<T>still</T>
<P/>
<T>in</T>
<T>the</T>
<T>train</T>
<C type="end" id="cb05e"/>
<PC>.</PC>
</U>
The basic task I need to do is to get the number of <T> nodes between certain pairs of <C> nodes. I've used the following stylesheet fragment to do this (illustrating with one specific pair of <C> nodes).
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="U">
<xsl:variable name="start-node" select="descendant::C[#id = 'cb01s']"/>
<xsl:variable name="end-node" select="descendant::C[#id = 'cb02e']"/>
<xsl:text>Result: </xsl:text>
<xsl:value-of select="count($start-node/following::T[following::C[generate-id(.) = generate-id($end-node)]])"/>
</xsl:template>
</xsl:stylesheet>
This works fine on such a short XML fragment as above and gives the correct result: Result: 6.
However, the actual XML document contains tens of thousands of <C> nodes and even more <T> nodes. So when I try to run the stylesheet on it the result comes back very slowly. (It would probably take days to finish completely.) I suppose the problem must be that on each run of the <xsl:value-of... line, the processor (Saxon) is checking all <T> nodes and generating id's for <C> nodes multiples times (i.e., exponentially) and that slows everything down.
Is there a way to speed up the process while still using generate-id()? Or do I need to get the number of <T> nodes with some alternate approach?

You do not need generate-id() just to avoid matching <C> elements intervening between the start and end nodes. You are matching <C> elements by their id attributes in the first place, and I see no reason not to use that more directly. For example,
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="U">
<xsl:variable name="start-id" select="cb01s"/>
<xsl:variable name="end-id" select="cb02e"/>
<xsl:text>Result: </xsl:text>
<xsl:value-of select="count(descendant::C[#id = $start-id]/following::T[following::C[#id = $end-id][1]])"/>
</xsl:template>
</xsl:stylesheet>
You can simplify that by removing the [1] position predicate if you can rely on the <C> element #ids to be unique in the document.
If generate-id() is indeed the primary cause of your performance problem, then avoiding it altogether ought to provide a big boost.

Related

Find Text Node with largest number of elements preceeding

With XML:
<?xml version="1.0" encoding="UTF-8"?>
<Root>
<A>
<B>
<C>
<Name>Bob</Name>
</C>
<D>
<Operation>Yes</Operation>
<E>
<Operation>No</Operation>
</E>
</D>
</B>
</A>
</Root>
I have XSLT that produces text output:
/Root/A/B/C/Name
/Root/A/B/D/Operation
/Root/A/B/D/E/Operation
Problem:
The deepest text node is /Root/A/B/D/E/Operation.
I'd like to be able to arrive at the number 5 (the text node with the largest / max number of element levels in front, prior to producing the output above.
So it should work for any XML document. Element names are unknown.
The XSLT 3 stylesheet
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:value-of select="let $leaf-elements := //*[not(*)],
$max-anc := max($leaf-elements/count(ancestor::*))
return ($max-anc, $leaf-elements!string-join(ancestor-or-self::*/name(), '/'))" separator="
"/>
</xsl:template>
</xsl:stylesheet>
run against your sample input outputs
5
Root/A/B/C/Name
Root/A/B/D/Operation
Root/A/B/D/E/Operation
Online sample https://xsltfiddle.liberty-development.net/6qVRKwh.

XSLT: Copy two files into one common structure

I try to merge result of SSIS Data Profiler Task for several tables into one XML for inspection of the results within one single file inside "Data Profiler Viewer". The whole problem shrinks to the stronly simplified XML-trasformation here:
File 1 (test_1.xml):
<a xmlns="http://schemas.microsoft.com/sqlserver/2008/DataDebugger/">
<b id="1"/>
<c>
<2: any other XML-structure to come here/>
</c>
</a>
File 2 (test_2.xml):
<a xmlns="http://schemas.microsoft.com/sqlserver/2008/DataDebugger/">
<b id="1"/>
<c>
<1: any other XML-structure to come here/>
</c>
</a>
(Element b is always exacly the same)
Expected result:
<a xmlns="http://schemas.microsoft.com/sqlserver/2008/DataDebugger/">
<b id="1"/>
<c>
<1: any other XML-structure to come here/>
<2: any other XML-structure to come here/>
</c>
</a>
Any help is stronly recommended! I will provide the solution to the original problem here.
Another try:
<?xml version='1.0' encoding="utf-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:t="http://schemas.microsoft.com/sqlserver/2008/DataDebugger/"
version="1.0">
<xsl:output method="xml" indent="yes" omit-xml-declaration="no" version="1.0" encoding="UTF-8"/>
<xsl:template match="t:c">
<xsl:element name="c" namespace="http://schemas.microsoft.com/sqlserver/2008/DataDebugger/">
<xsl:copy-of select="*" />
<xsl:copy-of select="document('test_2.xml')//t:c/node() " />
</xsl:element>
</xsl:template>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
checked with xalan (set classpath in environment)
java org.apache.xalan.xslt.Process -IN test1_1.xml -XSL test1.xslt -OUT test1_12.xml
and saxon (Change skript to Version = "1.1")
java -jar saxon-9.1.0.8j.jar -s:test_1.xml -xsl:test_1.xslt -o:test_12.xml

XSL check multiple nodes exist with for-each

If I have multiple nodes in an xsl document and want to check that they all have a child node that exists, how would you do that with a for-each loop in XSL 2?
<A>
<B>
<C>test</C>
</B>
<B>
<C>test</C>
</B>
</A>
For example in the document above, we want to iterate through all B Nodes in the document and ascertain if C exists with the value 'test' for that B node.
"we want to iterate through all B Nodes in the document and ascertain if C exists with the value 'test' for that B node"
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:for-each select="A/B[C='test']">
<!-- Rest of XSLT -->
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
You can add 'tests'/predicates using [].

How to select unique child nodes of all siblings in XSLT 1

I'm looking for the best way to get all unique (no duplicates) nested nodes of all sibling nodes. The node I'm am interested in is "Gases". The sibling nodes are "Content". My simplified XML:
<Collection>
<Content>
<Html>
<root>
<Gases>NO2</Gases>
<Gases>CH4</Gases>
<Gases>O2</Gases>
</root>
</Html>
</Content>
<Content>
<Html>
<root>
<Gases>NO2</Gases>
<Gases>CH4</Gases>
<Gases>CO</Gases>
<Gases>LEL</Gases>
<Gases>NH3</Gases>
</root>
</Html>
</Content>
</Collection>
Desired result: NO2 CH4 O2 CO LEL NH3
I'm new to XSLT so any help would be much appreciated. I've been trying to use XPATH, similar to here, but with no luck.
This XSLT stylesheet will produce the desired output. Note that it relies on there being no duplicate Gases element inside a single Content element.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<!-- Match Gases elements whose value does not appear in a Gases element inside a previous
Content element. -->
<xsl:template match="//Gases[not(. = ancestor::Content/preceding-sibling::Content//Gases)]">
<xsl:value-of select="."/>
<xsl:text> </xsl:text>
</xsl:template>
<!-- Need to override the built-in template for text nodes, otherwise they will still get
printed out. -->
<xsl:template match="text()"/>
</xsl:stylesheet>

XSLT node Traversal

Here is a snip-it of the XML:
<?xml version="1.0" encoding="iso-8859-1" ?>
<NetworkAppliance id="S123456">
<Group id="9">
<Probe id="1">
<Value>74.7</Value>
</Probe>
</NetworkAppliance>
I want to get the single point value of 74.7. There are many groups with unique ID's and many Probes under that group with unique ID's each with values.
I am looking for example XSLT code that can get me this one value. Here is what i have that does not work:
<?xml version="1.0" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html" version="3.2" />
<xsl:template match="NetworkAppliance">
<xsl:apply-templates select="Group[#id='9']"/>
</xsl:template>
<xsl:template match="Group">
Temp: <xsl:value-of select="Probe[#id='1']/Value"/>
<br/>
</xsl:template>
</xsl:stylesheet>
Here is what worked for me in the end:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!-- Edited by XMLSpy® -->
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<xsl:for-each select="NetworkAppliance/Group[#id=9]/Probe[#id=1]">
Value: <xsl:value-of select="Value" />
</xsl:for-each>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Don't forget that you can do select several levels at once. Fixing your XML to:
<?xml version="1.0" encoding="iso-8859-1" ?>
<NetworkAppliance id="S123456">
<Group id="9">
<Probe id="1">
<Value>74.7</Value>
</Probe>
</Group>
</NetworkAppliance>
and using this stylesheet:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html" version="3.2" />
<xsl:template match="/">
Temp: <xsl:value-of select="//Group[#id='9']/Probe[#id='1']/Value"/>
<br/>
</xsl:template>
</xsl:stylesheet>
we can pick out that one item you're interested in.
Points to note:
The // part of the expression means that the search for Group nodes takes place throughout the whole tree, finding Group nodes at whatever depth they're at.
The [#id='9'] part selects those Group nodes with id of 9
The Probe[#id='1'] part immediately after that selects those children of the Group nodes it found where the id is 1, and so on.
<xsl:value-of select="/NetworkAppliance/Group[#id=9]/Probe[#id=1]/Value"/>
XSLT is just one of the tools in the box, and nothing without XPath.
the xpath for value of a node is /node/text()
So
<xsl:value-of select="Probe[#id='1']/text()"/>