What's the rationale behind result tree fragments? - xslt

XSLT 1.0 adds an additional data type to those provided by XPath 1.0: result tree fragments.
This additional data type is called result tree fragment. A variable may be bound to a result tree fragment instead of one of the four basic XPath data-types (string, number, boolean, node-set). A result tree fragment represents a fragment of the result tree. A result tree fragment is treated equivalently to a node-set that contains just a single root node. However, the operations permitted on a result tree fragment are a subset of those permitted on a node-set. An operation is permitted on a result tree fragment only if that operation would be permitted on a string (the operation on the string may involve first converting the string to a number or boolean). In particular, it is not permitted to use the /, //, and [] operators on result tree fragments.
— https://www.w3.org/TR/xslt-10/#section-Result-Tree-Fragments
To me, this seems pointless. I cannot understand why anybody would want to do this! Result tree fragments just seem like a rubbish version of node-sets, requiring two intermediate variables and a language extension to allow a programmer to work around this seemingly arbitrary limitation.
To further pile on the uselessness of result tree fragments, here's the compatibility shim I stole put together to replicate exsl:node-set in MSXSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
exclude-result-prefixes="exsl msxsl">
<!-- exsl:node-set -->
<msxsl:script language="JScript" implements-prefix="exsl"><![CDATA[
this['node-set'] = function (x) {
return x;
}
]]></msxsl:script>
</xsl:stylesheet>
This literally just returns the result tree fragment unchanged, suggesting that MSXSL doesn't even bother with implementing result tree fragment as a different type and just treats it identically to a node-set, further suggesting that there's no real point to it in the first place!
Why do result tree fragments exist?
What is the use-case?
Why were they added?
Why not just use a node-set?

I wasn't on the Working Group at the time, but the following exchange might shed some light. In April 2001, during the development of XSLT 1.1, I asked the WG:
Can any one try to explain to me why there is a perceived problem with
the "result tree fragment as node-set" facility as defined in the XSLT
1.1 WD? I keep hearing that it won't work with the XPath 2.0 type system, but I can't see why.
I can see us wanting to change it so that the data type is "node"
rather than "node-set", but apart from that, I fail to see what the
problem is.
Is it perhaps that someone has in mind doing away with the root node
of the temporary tree, and making the value of the variable instead be
the sequence of nodes that are currently modelled as children of this
root? If so, why would that change be useful?
James Clark replied:
Is it perhaps that someone has in mind doing away with the root node of the temporary tree, and making the value of the variable instead be the sequence
of nodes that are currently modelled as children of this root?
Yes.
If so, why would that change be useful?
(a) So instructions can return nodes without copying them.
(b) So that you can use instructions to return things other than
nodes.
Explaining things more than this would require me to explain how I
hope to see XPath, XSLT and XQuery all fitting together. At this
point, let me just say that I think we need to harmonize element
construction in XSLT and XQuery. This will naturally lead to their
being much less of a gulf between expressions and instructions. I
think it will turn out to be just as awkward and inappropriate for
xsl:variable to automagically copy and wrap in a root node the value
produced by instantiating its content as it would be for it to do this
to the value produced by evaluating the expression specified in the
select attribute.
I think the WG invented the concept of "result tree fragments" because they wanted to keep options open for the future. They had ideas how the language would evolve, and they thought that making xsl:variable create a full blown node with full navigation capability would restrict the options for the future.
In retrospect I'm convinced it was a mistake, because it didn't actually achieve this objective. When we abolished RTFs in 2.0, we still found it necessary, for backwards compatibility reasons, to have the bizarre rule that xsl:variable always constructs a document node if there is no "as" attribute.
It's worth noting that no-one in the WG ever imagined that people would still be using XSLT 1.0 twenty years later. 1.0 took about two years to develop and the WG fully expected that within two years, it would be completely superseded by a later version. They were therefore very willing to put restrictions in the language if they kept options open for the next version.

Related

Variable for XPath nodes

I'm starting to learn XSLT/XPath, and I copied the following from a study guide, making some modifications:
<xsl:variable name="fname" select="'polist.xml'"/>
<xsl:variable name="thePath" select="'/collection/doc'"/>
...
<xsl:value-of select="count(doc($fname)/collection/doc)"/>
It reports the number of doc elements in the XML file. The doc() function accepts the file name variable 'fname'. But if I try to do the same with the 'thePath' variable in the count() function, using $thePath instead of the "/collection/doc" text, I get an error.
Suggestions on whether/how to use the 'thePath' variable in the count() function? Is it possible? Thanks!
Learning from examples leaves you very exposed to this kind of problem: it's easy to build a completely incorrect mental model of how the examples actually work. That's why I always advise people to start by reading a good book that explains the concepts first.
In your case you've made a common mistake, which is to assume that variables work like macros, that is, that they represent fragments of XPath text that can be substituted into an expression. That's not the case: variables represent values, the result of evaluating an expression, and you can only use a variable in places where a literal value (like a number or string) could appear.
(I suspect it's the use of the $ sign that leads to this false impression. $ is often used to represent variables in macro-like languages, for example shell scripts).
In XPath 1.0 there's no direct way of achieving what you are trying to do. In practice people either use vendor extensions for this, or they construct a pipeline in which phase 1 generates an XSLT stylesheet and phase 2 executes it (that's easier in XSLT than in most other languages, because XSLT is XML and can therefore be easily manipulated using XSLT).
In 3.0 you can evaluate XPath expressions supplied in the form of a string using the xsl:evaluate instruction. But very often, the requirement can be met better using functions. We don't know what the real underlying requirement is here so it's hard to know whether that's true in this case.
An example use of xsl:evaluate in XSLT 3 would be e.g.
<xsl:evaluate xpath="'count(' || $thePath || ')'" context-item="doc($fname)"/>

Is the behavior of XSLT/XQuery regex output implementation-dependent?

Using the regular expression specifications defined for XPath and XQuery, is it possible for two different implementations of fn:analyze-string, given as inputs the same regex and match strings, to return different results and still be considered conforming to the W3C Recommendation? Or should the same inputs always return the same results across different XQuery and XSLT processors?
Specifically, I am asking about the content of match, non-match, group, and #nr values, not the base URIs or node identities (which are clearly defined as implementation dependent).
There are one or two very minor aspects in which the spec is implementation-dependent:
The vendor is allowed to decide which version of Unicode to adopt as the baseline. There are some changes between versions of Unicode, for example changes to character categories, that can affect the outcome of expressions like \p{Cn} or \p{IsGreek}, or the question of whether two characters are considered case-variants of each other.
The rules for captured substrings are not quite precise in edge cases. The spec gives an example: For example given the regular expression (a*)+ and the input string "aaaa", an implementation might legitimately capture either "aaaa" or a zero length string as the content of the captured subgroup.
Beyond that, the results should be the same across processors. But of course, this is one area where processors might decide that 100% conformance is just too hard - for example in Saxon-JS, we decided to do the best we could using the Javascript 6 regex engine, which certainly leaves us short of 100% conformance with the XPath rules.
One must distinguish between three aspects of the terminology that are crucial:
Nondeterminism, which means that the same function/expression may return different results when evaluated several times with the same parameters/context (with the same implementation, in the same query).
Implementation-dependent behavior, which means that implementations may behave differently for a specific feature (but this does not mean that it cannot be deterministic within the same implementation).
Implementation-defined behavior, which is the same as implementation-dependent behavior, except that the implementation must document its behavior precisely so users can rely on it.
My understanding from the XQuery specification, but also from the XML Schema specification which defines the regular expression language, is that two implementations must return the same results to a call to fn:analyze-string, considerations on the enclosing element nodes left aside.
The XQuery specification says that the nondeterminism of fn:analyze-string is only due, as mentioned in the question, to the fact that the node identity may or may not be the same across repeated and identical calls.
The base URI and prefixes are implementation-dependent, and my understanding is that it is still implicitly meant that they must be chosen deterministically within a query.
Unless I overlooked something, the XML Schema specification does not seem to give any leeway to implementors on regular expressions. XQuery extends XML Schema regular expressions, but the only implementation-dependent feature is the capturing of some groups, which is only relevant for replacements.

XSLT Rules that brake and symbols that change meaning

These are just some observations born out of frustration while learning XSLT.
Meaning of the symbols changes depending on position and context
Following three lines show the usage of '/' character and in each line that character symbolizes something COMPLETELY different.
In first line represents root node, in second child-parent relationship and in third descendants of people element.
In other words the meaning of the same character has been overridden three times and it completely changes meaning depending on the position and the surrounding context.
<xsl:template match="/">
<xsl:template match="people/person">
<xsl:template match="people//name">
Rules brake when combined
In the following expression '/' means root node.
<xsl:template match="/">.
In the following expression '/' means that person is child of people.
<xsl:template match="people/person">.
Now if you try to combine these two rules to reference people which should be child of the root node you write <xsl:template match="//people"> and get the expected result but only by accident.
That is because '//' has completely new meaning that has nothing to do with root node or child parent relationship.
It means all descendants and in this context it refers to all descendant of the root node and not just the people child.
It makes me wonder
With all these problems plaguing XSLT it makes me wonder who were the people that created it, what exactly did they try to achieve, what were their guidelines and who are indented users.
I am sure that these people are extremely smart since XSLT is extremely complicated.
But I wonder if they were creating this monster only for them selves either deliberately making it so unnecessary complex and confusing in order to erect artificial barrier against everybody else or if they simply couldn't care less about others that might want to use it as long as the thing seems usable to them.

XSLT template match: recipe for moving disallowed axes to predicate

I understand that the XSLT 1.0 standard disallows most XPath axes in the StepPatern portion of a match expression. (See this question where the recommended alternative was using the desired axis in a Predicate.)
I have a complex XPath expression that returns a node set, node-set-expression. I would like to make a template matching node-set-expression/ following-sibling::*. Is there a general way to rewrite this to use Predicates so that it can be used in the match attribute of a XSLT template element?
And equivalently, is there a general way to translate the following:
node-set-expression/ preceding-sibling::*
node-set-expression/ self-and-following-sibling::* (this is shorthand; I know it's not a valid axis)
If Predicates won't work, are there any other general approaches?
In XSLT 2.0 I tend to handle such cases by preselecting the matching nodes in a global variable:
<xsl:variable name="special-nodes" select="//something/preceding-sibling::*"/>
<xsl:template match="*[. intersect $special-nodes]"/>
In XSLT 3.0 this will simplify further to
<xsl:template match="$special-nodes"/>
An advantage of doing it this way is that searching for the "special nodes" once is likely to be a lot more efficient than testing every node against every such pattern when doing an apply-templates; it also makes the condition clearer, in my view.
The only general solution I know to your question for XSLT 1.0 is to write the pattern as
<xsl:template match="*[count(.|//something/preceding-sibling::*) =
count(//something/preceding-sibling::*)]">
but that really is too horribly inefficient to contemplate.

How do I match an element that has a certain sibling element in xslt/xpath

I'm trying to match all a/c elements that have a b sibling. I've tried:
<xsl:template match="a/b/../c">
but I get "Cannot convert the expression {..} to a pattern" from Saxon.
My XSLT/XPath skills are basic at best...
<xsl:template match="a[b]/c">
Explanation: match any c element that is a child of an a element that has a b child.
You should also be able to use .. in a predicate:
<xsl:template match="a/c[../b]">
which is similar to what you were trying.
The reason you can't use .. directly in a match pattern (i.e. outside of a predicate) is that patterns are not XPath expressions, even though they look and behave very similarly. In particular, only "downward" axes (child::, descendant::, and attribute::) are allowed directly in patterns. (The only explicit axes allowed by the spec are child:: and attribute::. descendant:: is implicitly allowed via the // between pattern steps. Saxon seems to bend the rules a bit here, allowing an explicit descendant:: axis, and even descendant-or-self::!)
The reason given for the restriction on axes is that (a) other axes are rarely needed in patterns, and (b) this allows XSLT processors to be much more efficient in testing for matches.
But predicates in patterns are defined with the same syntax rules as in XPath expressions (with some restrictions in XSLT 1.0, like not allowing variable references or current()). So you can use other axes, like parent::, or the abbreviation ...
This XPath expression seems to do the job (may be not optimal):
//a|//c[following-sibling::b or preceding-sibling::b]
Edit:
In case LarsH is right, it should be //a/c instead of //a|//c.