XSLT/Xpath -- sum function performance - xslt

I often use this xpath sum(preceding::*/string-length())
It does what I need it to do (provides a character count of all text up to this context in the XML file).
Problem: it is slow.
Is there a different built in function that I should be using instead? Or an extension?
UPDATE:
Based on Michael Kay's comment, I explored XSLT 3.0 <accumulator>. It was my first try with 3.0 (I had to update OxygenXML to make it work). I haven't fully adapted it to my needs, but initial test below shows promise.
<xsl:output method="xml" />
<xsl:accumulator
name="f:string-summ"
post-descent="f:accum-string-length"
as="xs:integer"
initial-value="0">
<xsl:accumulator-rule
match="text/*"
new-value="$value + string-length()"/>
</xsl:accumulator>
<xsl:template match="text/*">
<xsl:value-of select="f:accum-string-length()" />
</xsl:template>
Off topic: Stack Overflow needs an "XSLT-3.0" tag.

If you call this function on every node, then you stylesheet performance will be O(n^2) in the number of nodes.
The function is incorrect anyway. The preceding axis gives you your parent's preceding siblings and also the children of your parent's preceding siblings, so the string length of your cousins is being counted more than once.
Try defining a memo function something like this:
<xsl:function name="f:preceding-string-length" saxon:memo-function="yes">
<xsl:param name="n" as="element()"/>
<xsl:sequence select="sum(ancestor::*/preceding-sibling::*[1]/(f:preceding-string-length(.) + string-length(.)))"/>
</xsl:function>
Or use an XSLT 3.0 accumulator, which amounts to much the same thing.

I don't think the sum function is slow, the navigation to all preceding elements and the computation of the string length of all contents is expensive. As for optimizing it, which XSLT 2.0 processor do you use?

Related

Constructing, not selecting, XSL node set variable

I wish to construct an XSL node set variable using a contained for-each loop. It is important that the constructed node set is the original (a selected) node set, not a copy.
Here is a much simplified version of my problem (which could of course be solved with a select, but that's not the point of the question). I've used the <name> node to test that the constructed node set variable is in fact in the original tree and not a copy.
XSL version 1.0, processor is msxsl.
Non-working XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="iso-8859-1" omit-xml-declaration="yes" />
<xsl:template match="/">
<xsl:variable name="entries">
<xsl:for-each select="//entry">
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:variable>
<xsl:variable name="entryNodes" select="msxsl:node-set($entries)"/>
<xsl:for-each select="$entryNodes">
<xsl:value-of select="/root/name"/>
<xsl:value-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
XML input:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<name>X</name>
<entry>1</entry>
<entry>2</entry>
</root>
Wanted output:
X1X2
Actual output:
12
Of course the (or a) problem is the copy-of, but I can't work out a way around this.
There isn't a "way around it" in XSLT 1.0 - it's exactly how this is supposed to work. When you have a variable that is declared with content rather than with a select then that content is a result tree fragment consisting of newly-created nodes (even if those nodes are a copy of nodes from the original tree). If you want to refer to the original nodes attached to the original tree then you must declare the variable using select. A better question would be to detail the actual problem and ask how you could write a suitable select expression to find the nodes you want without needing to use for-each - most uses of xsl:if or xsl:choose can be replaced with suitably constructed predicates, maybe involving judicious use of xsl:key, etc.
In XSLT 2.0 it's much more flexible. There's no distinction between node sets and result tree fragments, and the content of an xsl:variable is treated as a generic "sequence constructor" which can give you new nodes if you construct or copy them:
<xsl:variable name="example" as="node()*">
<xsl:copy-of select="//entry" />
</xsl:variable>
or the original nodes if you use xsl:sequence:
<xsl:variable name="example" as="node()*">
<xsl:sequence select="//entry" />
</xsl:variable>
I wish to construct an XSL node set variable using a contained
for-each loop.
I have no idea what that means.
It is important that the constructed node set is the original (a
selected) node set, not a copy.
This part I think I understand a little better. It seems you need to replace:
<xsl:variable name="entries">
<xsl:for-each select="//entry">
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:variable>
with:
<xsl:variable name="entries" select="//entry"/>
or, preferably:
<xsl:variable name="entries" select="root/entry"/>
The resulting variable is a node-set of the original entry nodes, so you can do simply:
<xsl:for-each select="$entries">
<xsl:value-of select="/root/name"/>
<xsl:value-of select="."/>
</xsl:for-each>
to get your expected result.
Of course, you could do the same thing by operating directly on the original nodes, in their original context - without requiring the variable.
In response to the comments you've made:
We obviously need a better example here, but I think I am getting a vague idea of where you want to go with this. But there are a few things you must understand first:
1.
In order to construct a variable which contains a node-set of nodes in their original context, you must use select. This does not place any limits whatsoever on what you can select. You can do your selection all at once, or in stages, or even in a loop (here I mean a real loop). You can combine the intermediate selections you have made in any way sets can be combined: union, intersection, or difference. But you must use select in all these steps, otherwise you will end up with a set of new nodes, no longer having the context they did in the source tree.
IOW, the only difference between using copy and select is that the former creates new nodes, which is precisely what you wish to avoid.
2.
xsl:for-each is not a loop. It has no hierarchy or chronology. All the nodes are processed in parallel, and there is no way to use the result of previous iteration in the current one - because no iteration is "previous" to another.
If you try to use xsl:for-each in order to add each of n processed nodes to a pre-existing node-set, you will end up with n results, each containing the pre-existing node-set joined with one of the processed nodes.
3.
I think you'll find the XPath language is quite powerful, and allows you to select the nodes you want without having to go through the complicated loops you hint at.
It might help if you showed us a problem that can't be trivially solved in XSLT 1.0. You can't solve your problem the way you are asking for: there is no equivalent of xsl:sequence in XSLT 1.0. But the problem you have shown us can be solved without such a construct. So please explain why you need what you are asking for.

Sum of Multiplied Values

I have a fairly convoluted XML file and I need to do a weighted average of a few values within it using XSL. I am able to complete a sum of the weights OR of the values, but I cannot get the multiplication to work. I get an error:
XPTY0004: A sequence of more than one item is not allowed as the first
operand of '*'
I am not able to share the XML, but I have simplified the XML to the following example (assume there are a large number of foos):
<group>
<fooList>
<foo>
<attributeList>
<Attribute ID="1" Weight="0.5">
<otherParams />
</Attribute>
</attributeList>
<Properties>
<PhysicalProperties>
<Volume Average="125" Unknown="50" />
</PhysicalProperties>
</Properties>
</foo>
</fooList>
</group>
My current attempt to get the weighted average is the following:
<xsl:variable name="WeightedVolume" select="sum(/group/fooList/foo[attributeList/Attribute/[#ID=$test_id]]/attributeList/Attribute/#Weight * /group/fooList/foo[attributeList/Attribute/[#ID=$test_id]]/Properties/PhysicalProperties/Volume/#Average)"/>
I know there are similar questions available - but most of them deal with something like summing and multiplying foo
<foo>
<Weight>0.5</Weight>
<VolumeAverage>125</VolumeAverage>
</foo>
The answer on this StackOverflow Question appeals to me, but I cannot seem to make it work.
I'm using Saxon-HE 9.5.1.1N from Saxonica, with Visual Studio 2013.
Edited
I was able to get something to work for XSL 2, but need to have a fall-back for XSL1.
<xsl:variable name="WeightedVolume" select="sum(for $i in /group/FooList/foo[attributeList/Attribute[#ID=$test_id] return $i/AttributeList/Attribute/#Weight * $i/Properties/PhysicalProperties/Volume/#Average)"/>
To follow the example in that question you linked to, you would use this in XSLT 2.0/XPath 2.0:
<xsl:variable name="FoosToCalculate"
select="/group/fooList/foo[attributeList/Attribute/#ID = $test_id]" />
<xsl:variable name="WeightedVolume"
select="sum($FoosToCalculate/(attributeList/Attribute/#Weight *
Properties/PhysicalProperties/Volume/#Average)
)"/>
Doing this summing in XSLT 1.0 is considerably more involved and typically involves either using recursive templates or some manifestation of the node-set() function. Here is an example of the latter:
<xsl:stylesheet version="1.0"
xmlns:ex="http://exslt.org/common"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
<!-- determine $test_id however you need to -->
<xsl:variable name="products">
<xsl:for-each
select="/group/fooList/foo[attributeList/Attribute/#ID = $test_id]">
<product>
<xsl:value-of select="attributeList/Attribute/#Weight *
Properties/PhysicalProperties/Volume/#Average" />
</product>
</xsl:for-each>
</xsl:variable>
<xsl:value-of select="sum(ex:node-set($products)/product)"/>
</xsl:template>
</xsl:stylesheet>
For completeness, if you want to sum over a computed quantity in XSLT 1.0, there are three ways of doing it:
(a) recursion: write a recursive template that processes the items in the sequence one by one, computing the total as it goes.
(b) create an XML tree in which the computed quantities are node values, and then process this tree using the sum() function. To do this in a single stylesheet you will need the exslt:node-set() extension function.
(c) use an extension function provided by the XSLT vendor, or user-written using the facilities provided by the vendor for calling external functions.
In XSLT 2.0, it can always be done using the construct
sum(for $x in node-set return f($x))
where f is a function that computes the quantity.

Building an xslt variable with elements from multiple files

I'm no XSLT guru and couldn't find a similar example online. I'd like to assemble a list of apps from multiple files into a single var that can be used for searching.
Basically when I replace the original variable decleration with the new one, XSLT doesn't like it. I did output the variable contents to a file and they are identical in formatting of the XML elements so it must be failing around some metadata linked to the variable somewhere.
XML element format in all the files
<include-application name="appname" type="blah"/>
Orignal variable
<xsl:variable name="applications" select="board/packaging/*/include-application"/>
New variable definition
<xsl:variable name="applications">
<xsl:copy-of select="board/packaging/*/include-application"/>
<xsl:for-each select="board/packaging/applications/include">
<xsl:variable name="appset" as="xs:string" select="#name"/>
<xsl:variable name="includefile" as="xs:string" select="concat('../share/appsets/', $appset, '.xml')"/>
<xsl:copy-of select="document($includefile)/applications/include-application"/>
</xsl:for-each>
</xsl:variable>
Then when I try to access the elements to pick something, it fails with the new variable definition (line 39 is the first one in block below).
<xsl:variable name="type" select="$applications[#name = $appname]/#type"/>
<xsl:variable name="appid" select="$app-names/application-package-name[#name = $appname]/appid[#type = $type]/#value"/>
XPath error : Invalid type
runtime error: file xslt/blah.xslt line 39 element variable
Failed to evaluate the expression of variable 'type'.
Thanks
David
Since you are using XSLT 2.0, as evidenced by your use of as=, you need to declare the memory organization of your variable so as to act on it later. I think all you need to do is simply change the first line of your variable to be:
<xsl:variable name="applications" as="element(include-application)*">
...
That will tell the processor not to build a tree, but rather, to build a node set.
There is helpful information and a diagram on page 223 of my XSLT book that is available for free download on a "try and buy" basis at http://www.CraneSoftwrights.com/training/#ptux ... if you decide not to pay for the book, please delete the copy that you download for free.
The problem should be that the variable contains a XML fragment instead than a node-set (see this for example) - different XSLT processor have different extension functions to do the conversion - for example using a Microsoft processor you would do:
<xsl:variable name="type" select="msxsl:node-set($applications)[#name = $appname]/#type"/>
other processors have different functions (or the same function in a different namespace)
Using a combination of the above answers and looking into it more deeply, I used the exslt extension for converting to a node set and it works flawlessly now.
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt="http://exslt.org/common">
...
<xsl:variable name="applications" as="element(include-application)*">
...
<xsl:variable name="type" select="exslt:node-set($applications)/include-application[#name = $appname]/#type"/>

XSLT xsl:sequence. What is it good for..?

I know the following question is a little bit of beginners but I need your help to understand a basic concept.
I would like to say first that I'm a XSLT programmer for 3 years and yet there are some new and quite basics things I've been learning here I never knew (In my job anyone learns how to program alone, there is no course involved).
My question is:
What is the usage of xsl:sequence?
I have been using xsl:copy-of in order to copy node as is, xsl:apply-templates in order to modifiy nodes I selected and value-of for simple text.
I never had the necessity using xsl:sequence. I would appreciate if someone can show me an example of xsl:sequence usage which is preferred or cannot be achieved without the ones I noted above.
One more thing, I have read about the xsl:sequence definition of course, but I couldn't infer how it is useful.
<xsl:sequence> on an atomic value (or sequence of atomic values) is the same as <xsl:copy-of> both just return a copy of their input. The difference comes when you consider nodes.
If $n is a single element node, eg as defined by something like
<xsl:variable name="n" select="/html"/>
Then
<xsl:copy-of select="$n"/>
Returns a copy of the node, it has the same name and child structure but it is a new node with a new identity (and no parent).
<xsl:sequence select="$n"/>
Returns the node $n, The node returned has the same parent as $n and is equal to it by the is Xpath operator.
The difference is almost entirely masked in traditional (XSLT 1 style) template usage as you never get access to the result of either operation the result of the constructor is implicitly copied to the output tree so the fact that xsl:sequence doesn't make a copy is masked.
<xsl:template match="a">
<x>
<xsl:sequence select="$n"/>
</x>
</xsl:template>
is the same as
<xsl:template match="a">
<x>
<xsl:copy-of select="$n"/>
</x>
</xsl:template>
Both make a new element node and copy the result of the content as children of the new node x.
However the difference is quickly seen if you use functions.
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:f="data:,f">
<xsl:variable name="s">
<x>hello</x>
</xsl:variable>
<xsl:template name="main">
::
:: <xsl:value-of select="$s/x is f:s($s/x)"/>
:: <xsl:value-of select="$s/x is f:c($s/x)"/>
::
:: <xsl:value-of select="count(f:s($s/x)/..)"/>
:: <xsl:value-of select="count(f:c($s/x)/..)"/>
::
</xsl:template>
<xsl:function name="f:s">
<xsl:param name="x"/>
<xsl:sequence select="$x"/>
</xsl:function>
<xsl:function name="f:c">
<xsl:param name="x"/>
<xsl:copy-of select="$x"/>
</xsl:function>
</xsl:stylesheet>
Produces
$ saxon9 -it main seq.xsl
<?xml version="1.0" encoding="UTF-8"?>
::
:: true
:: false
::
:: 1
:: 0
::
Here the results of xsl:sequence and xsl:copy-of are radically different.
The most common use case for xsl:sequence is to return a result from xsl:function.
<xsl:function name="f:get-customers">
<xsl:sequence select="$input-doc//customer"/>
</xsl:function>
But it can also be handy in other contexts, for example
<xsl:variable name="x" as="element()*">
<xsl:choose>
<xsl:when test="$something">
<xsl:sequence select="//customer"/>
</xsl:when>
<xsl:otherwise>
<xsl:sequence select="//supplier"/>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
The key thing here is that it returns references to the original nodes, it doesn't make new copies.
Well to return a value of a certain type you use xsl:sequence as xsl:value-of despite its name always creates a text node (since XSLT 1.0).
So in a function body you use
<xsl:sequence select="42"/>
to return an xs:integer value, you would use
<xsl:sequence select="'foo'"/>
to return an xs:string value and
<xsl:sequence select="xs:date('2013-01-16')"/>
to return an xs:date value and so on. Of course you can also return sequences with e.g. <xsl:sequence select="1, 2, 3"/>.
You wouldn't want to create a text node or even an element node in these cases in my view as it is inefficient.
So that is my take, with the new schema based type system of XSLT and XPath 2.0 a way is needed to return or pass around values of these types and a new construct was needed.
[edit]Michael Kay says in his "XSLT 2.0 and XPath 2.0 programmer's reference" about xsl:sequence: "This innocent looking instruction introduced in XSLT 2.0 has far reaching effects on the capability of the XSLT language, because it means that XSLT instructions and sequence constructors (and hence functions and templates) become capable of returning any value allowed by the XPath data model. Without it, XSLT instructions could only be used to create new nodes in a result tree, but with it, they can also return atomic values and references to existing nodes.".
Another use is to create a tag only if it has a child. An example is required :
<a>
<b>node b</b>
<c>node c</c>
</a>
Somewhere in your XSLT :
<xsl:variable name="foo">
<xsl:if select="b"><d>Got a "b" node</d></xsl:if>
<xsl:if select="c"><d>Got a "c" node</d></xsl:if>
</xsl:variable>
<xsl:if test="$foo/node()">
<wrapper><xsl:sequence select="$foo"/></wrapper>
</xsl:if>
You may see the demo here : http://xsltransform.net/eiZQaFz
It is way better than testing each tag like this :
<xsl:if test="a|b">...</xsl:if>
Because you would end up editing it in two places. Also the processing speed would depend on which tags are in your imput. If it is the last one from your test, the engine will test the presence of everyone before. As $foo/node() is an idioms for "is there a child element ?", the engine can optimize it. Doing so, you ease the life of everyone.

Indirect variable/parameter reference (name in another property / another variable)

Is it possible using XSL to access a variable (or a parameter) whose name is stored in another variable (or parameter)? If no, why?
I am new to xsl, coming from other languages, where this functionality is accessible, like bash, ant. Maybe I was wrong even looking for an answer to this question. But since I didn't find it on SO, I think there should be one.
Two examples. I have parameters p1, p2, p3. Then I have a parameter pname whose value is a string p2. I would like to read the value of p2 using pname, something like $$pname or ${$pname}. Or in a more complicated way. If pnumber is equal to 2, then I would like to read the value of the parameter with name concat('p', $pnumber), something I would code asparam-value(concat('p', $pnumber)).
This is possible whenthe XSLT stylesheet accesses itself as a regular XML document:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:param name="p1" select="'P1-Value'"/>
<xsl:param name="p2" select="'P2-Value'"/>
<xsl:param name="p3" select="'P3-Value'"/>
<xsl:param name="pName" select="'p3'"/>
<xsl:param name="pNumber" select="2"/>
<xsl:variable name="vDoc" select="document('')"/>
<xsl:template match="/">
<xsl:value-of select=
"concat('Param with name ',
$pName,
' has value: ',
$vDoc/*/xsl:param[#name = $pName]/#select
)"/>
<xsl:text>
</xsl:text>
<xsl:variable name="vParam" select=
"$vDoc/*/xsl:param[#name = concat('p', $pNumber)]"/>
<xsl:value-of select=
"concat('Param with name p',
$pNumber,
' has value: ',
$vParam/#select
)"/>
</xsl:template>
</xsl:stylesheet>
produces the wanted result:
Param with name p3 has value: 'P3-Value'
Param with name p2 has value: 'P2-Value'
Explanation:
The expression document('') selects the document node of the current XSLT stylesheet.
A limitation is that the current XSLT stylesheet must have (be accessible via) a URI (such as residing at a given file and accessible by its filename) -- the above code doesn't produce a correct result if the stylesheet is dynamically generated (a string in memory).
In libxslt the thing is possible through dyn:evaluate extension. Here is the description. There is total of 3 processors mentioned which are said to support this function:
Xalan-J from Apache (version 2.4.1) and
4XSLT, from 4Suite. (version 0.12.0a3)
libxslt from Daniel Veillard et al. (version 1.0.19)
A portable workaround. If you control both the application and the stylesheet, you should pass the parameters as an xml document. Most processors give the option to make parameter a node-set. For example in MSXML I did it using:
xslProc.addParameter("params", xmlParams)
where xslProc is of processor type, created from "Msxml2.XSLTemplate.6.0" using createProcessor method and xmlParams is DomDocument. Inside the stylesheet I was accesing my parameters using something like that:
<xsl:variable name="value">
<xsl:value-of select="$params//*[name() = concat('p', $pnumber)]" />
</xsl:variable>
If the processor does not support node-set external parameters, one may always combine the parameters with the data in one xml document. This works well in memory. If access to external files is possible, one may use document('params.xml') syntax to access the parameters stored in a separate file.
I was also looking for a possibility to parse xml string and have a node-set of it, but it is seems to be available only as an extension in some xslt 2.0 parsers. I wanted 1.0 solution.