Debug possible escaped-text issue in XSLT? - xslt

I have an XSL template that gives different results when used in two different contexts.
The template manifesting the defect is:
<xsl:template match="*" mode="blah">
<!-- snip irrelevant stuff -->
<xsl:if test="see">
<xsl:message>Contains a cross-ref. <xsl:value-of select="."/></xsl:message>
</xsl:if>
<xsl:apply-templates select="."/>
</xsl:template>
Given:
<el>This is a '<see cref="foo"/>' cross-referenced element.</el>
In one situation, I get the desired result:
Contains a cross-ref. This is a ' ' cross-referenced element.
(the <see/> is being dealt with as an XML element and is ultimately matched by another template.)
But in another situation, the xsl:if doesn't trigger and if I output the contents with <xsl:message><xsl:value-of select="."/>, I get:
This is a '<see cref="foo"/>' cross-referenced element.
It seems to me that in the latter improperly-behaving scenario, it's acting like it's been output-escaped. Does that make sense? Am I barking up the wrong tree? This is a typically complex XSL situation and trying to trace the call-stack is difficult; is there a particular XSLT processing command I should be looking for?

Related

xsl 3.0: How to process certain child elements first in xsl:apply-templates, then the remainder (overriding document order)

Assume my xml input is a MFMATR element with a few child elements, such as: TRLIST, INTRO, and SBLIST -- in that document order. I am converting to HTML.
I have a template that matches on the MFMATR element, and wants to run xsl:apply-templates on the 3 child elements, but I want INTRO to be processed first (listed first in the HTML). The other two (TRLIST and SBLIST) should keep their relative document order, as long as INTRO comes before both of them.
So I'd like to run <xsl:apply-templates select="INTRO, *"> but not have INTRO matched twice. (Using this syntax with xsl 3.0 causes dupes for me.) I also don't want to explicitly list every tag in the select expression, so unknown tags will still be processed.
A 2nd real life example is this: <xsl:apply-templates select="TITLE, CHGDESC, *"/>. Again, right now that is causing dupes I don't want.
I am using Saxon.
So I'd like to run <xsl:apply-templates select="INTRO, *"> but not have INTRO matched twice
Try:
<xsl:apply-templates select="INTRO, * except INTRO">
This seems to work. If someone has a better answer, let me know and I will change it.
There is no DRY violation here -- no repeated element names or variable names. I want it to look clean at all the call sites I will have.
It seems idiomatic to me since the function was pulled from w3's own website!
<xsl:template match="MFMATR">
<!-- Process INTRO first, no matter where it appears -->
<xsl:variable name="nodes" select="INTRO, *"/>
<xsl:apply-templates select="kp:distinct_nodes_stable($nodes)"/>
</xsl:template>
<xsl:template match="INTRO">
<xsl:variable name="nodes" select="TITLE, CHGDESC, *"/>
<xsl:apply-templates select="kp:distinct_nodes_stable($nodes)"/>
</xsl:template>
<!-- Discard duplicate elements in $seq, but keep their ordering -->
<!-- Adapted from https://www.w3.org/TR/xpath-functions/#func-distinct-nodes-stable -->
<xsl:function name="kp:distinct_nodes_stable" as="node()*">
<xsl:param name="seq" as="node()*"/>
<xsl:sequence select="fold-left($seq, (),
function($foundSoFar as node()*, $this as node()) as node()* {
if ($foundSoFar intersect $this)
then $foundSoFar
else ($foundSoFar, $this)
}) "/>
</xsl:function>

Simple XSLT template failing in some cases

Part of some XSLT I am working on is this very simple template to show up an unresolved reference type of error.
<!-- a basic check when matching on copying index elements - are they referring to a defined item element -->
<xsl:template match="index" mode="expand">
<xsl:variable name="index_name_xml"><xsl:value-of select="#name"/></xsl:variable>
<xsl:if test="not(//item[#name=$index_name_xml])">
<xsl:message terminate="yes"><xsl:value-of select="concat('FAIL : cannot find "',$index_name_xml,'" in items')"/></xsl:message>
</xsl:if>
</xsl:template>
When this element
<index name="User X Ordinate"/>
is matched in input doc the above template is called, the templates xpath SHOULD find this node (in input doc)
<item name="User X Ordinate" address="UserXOrd_s" usage="realtime" type="uint16_t" unit="unit_ordinate_q8" />
but it doesn't and I get my fail message
FAIL : cannot find "User X Ordinate" in dbitems Error at char 7 in xsl:value-of/#select on line 253 column 130 of db_expander.xsl:
XTMM9000: Processing terminated by xsl:message at line 253 in db_expander.xsl
and I am scratching my head as there are dozens of cases in my transformation where the template does what I want, and TWO cases when it doesn't (a clue I cant figure out yet). I cant see any spelling errors and the two slashes in the xpath should mean ALL 'item' elements at any level in the document are checked. I cant see how is doesn't work.
EDIT :: Apologies for this amateurish post. I kind of got lost trying to recreate a simple version of the problem where I could post the whole source. My partial understanding is that the problem may be related to how the XSL is passing the node /context into the template -- its slightly out of my depth at the moment but -- result tree fragment / context in the source XML?
However, if I add a 'root' variable into the template (shown below) the template does what I want -- the problems are gone -- so the problem seems to be relating to the context being passed. I tried but failed to make a small stand alone example that fails to post here -- my tests kept working...so I am obviously still not grasping a finer point(s) yet.
<xsl:variable name="root" select="/"/>
<xsl:template match="index" mode="expand">
<xsl:variable name="index_name" > <xsl:value-of select="#name"/></xsl:variable>
<xsl:choose>
<xsl:when test="$root//dbgroup//item[#name=$index_name]">
<!--xsl:message terminate="no">
<xsl:value-of select="concat('item found for : ',$index_name, ' (parent is ',parent::node()/#name,')')"/>
</xsl:message-->
</xsl:when>
<xsl:otherwise>
<xsl:message terminate="no">
<xsl:value-of select="concat('item NOT found for : ',$index_name, ' (parent is ',parent::node()/#name,')')"/>
</xsl:message>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
I need to do some more reading as I dont know a good way to debug this other than xsl:messages....
If a template is supplied with an RTF (result tree fragment) then unfortunately '//*' doesn't refer to ANY element in the hierarchy of the source document anymore, but rather ANY element in the RTF hierarchy which does not contain all the elements (in my case / most cases) of the source document.
Hence why I needed to use the $root variable inside the template in my 'EDIT' above in order to get access to elements not in the RTF.
The trick is knowing that you will end up with an RTF when you apply templates within a variable declaration in order to populate it. And so when this is passed to a template, you will need another way to get back to the context of your source document.

Constructing, not selecting, XSL node set variable

I wish to construct an XSL node set variable using a contained for-each loop. It is important that the constructed node set is the original (a selected) node set, not a copy.
Here is a much simplified version of my problem (which could of course be solved with a select, but that's not the point of the question). I've used the <name> node to test that the constructed node set variable is in fact in the original tree and not a copy.
XSL version 1.0, processor is msxsl.
Non-working XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="iso-8859-1" omit-xml-declaration="yes" />
<xsl:template match="/">
<xsl:variable name="entries">
<xsl:for-each select="//entry">
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:variable>
<xsl:variable name="entryNodes" select="msxsl:node-set($entries)"/>
<xsl:for-each select="$entryNodes">
<xsl:value-of select="/root/name"/>
<xsl:value-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
XML input:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<name>X</name>
<entry>1</entry>
<entry>2</entry>
</root>
Wanted output:
X1X2
Actual output:
12
Of course the (or a) problem is the copy-of, but I can't work out a way around this.
There isn't a "way around it" in XSLT 1.0 - it's exactly how this is supposed to work. When you have a variable that is declared with content rather than with a select then that content is a result tree fragment consisting of newly-created nodes (even if those nodes are a copy of nodes from the original tree). If you want to refer to the original nodes attached to the original tree then you must declare the variable using select. A better question would be to detail the actual problem and ask how you could write a suitable select expression to find the nodes you want without needing to use for-each - most uses of xsl:if or xsl:choose can be replaced with suitably constructed predicates, maybe involving judicious use of xsl:key, etc.
In XSLT 2.0 it's much more flexible. There's no distinction between node sets and result tree fragments, and the content of an xsl:variable is treated as a generic "sequence constructor" which can give you new nodes if you construct or copy them:
<xsl:variable name="example" as="node()*">
<xsl:copy-of select="//entry" />
</xsl:variable>
or the original nodes if you use xsl:sequence:
<xsl:variable name="example" as="node()*">
<xsl:sequence select="//entry" />
</xsl:variable>
I wish to construct an XSL node set variable using a contained
for-each loop.
I have no idea what that means.
It is important that the constructed node set is the original (a
selected) node set, not a copy.
This part I think I understand a little better. It seems you need to replace:
<xsl:variable name="entries">
<xsl:for-each select="//entry">
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:variable>
with:
<xsl:variable name="entries" select="//entry"/>
or, preferably:
<xsl:variable name="entries" select="root/entry"/>
The resulting variable is a node-set of the original entry nodes, so you can do simply:
<xsl:for-each select="$entries">
<xsl:value-of select="/root/name"/>
<xsl:value-of select="."/>
</xsl:for-each>
to get your expected result.
Of course, you could do the same thing by operating directly on the original nodes, in their original context - without requiring the variable.
In response to the comments you've made:
We obviously need a better example here, but I think I am getting a vague idea of where you want to go with this. But there are a few things you must understand first:
1.
In order to construct a variable which contains a node-set of nodes in their original context, you must use select. This does not place any limits whatsoever on what you can select. You can do your selection all at once, or in stages, or even in a loop (here I mean a real loop). You can combine the intermediate selections you have made in any way sets can be combined: union, intersection, or difference. But you must use select in all these steps, otherwise you will end up with a set of new nodes, no longer having the context they did in the source tree.
IOW, the only difference between using copy and select is that the former creates new nodes, which is precisely what you wish to avoid.
2.
xsl:for-each is not a loop. It has no hierarchy or chronology. All the nodes are processed in parallel, and there is no way to use the result of previous iteration in the current one - because no iteration is "previous" to another.
If you try to use xsl:for-each in order to add each of n processed nodes to a pre-existing node-set, you will end up with n results, each containing the pre-existing node-set joined with one of the processed nodes.
3.
I think you'll find the XPath language is quite powerful, and allows you to select the nodes you want without having to go through the complicated loops you hint at.
It might help if you showed us a problem that can't be trivially solved in XSLT 1.0. You can't solve your problem the way you are asking for: there is no equivalent of xsl:sequence in XSLT 1.0. But the problem you have shown us can be solved without such a construct. So please explain why you need what you are asking for.

XPath sorting inline if statement

I've been trying to wrap my head around using XPath and XQuery for this with the help of some previous posts to no avail. Right now I have null child nodes which should just default to ordering at the end of a sort but unfortunately, the sort does not occur at all on these null nodes. As a result I have been trying to find a way to set them to zero during the sorting section. Here is a sample below:
<xsl:for-each select="MyItems/Item">
<xsl:sort select="Order/obj/Number" order="ascending">
I want to do something similar to an inline if statement as part of the sort like in C# below:
foreach(item in MyItems.OrderBy(Order/obj/Exists != false ? Order/obj/Number : 0)
I was using these links: dynamic xpath expression and XSLT transfom with inline if statements to try and understand but I'm still not getting it. Any help is appreciated. I need the solution in XSLT.
Your situation is unclear as you say nothing about the contents of your XML or the nature of your XSLT transform. But it sounds something like you have Item elements with no Order/obj/Number elements to sort on?
I would code that something like this
<xsl:template match="/root">
<xsl:copy>
<xsl:apply-templates select="MyItems/Item[Order/obj/Number]">
<xsl:sort select="Order/obj/Number" />
</xsl:apply-templatesh>
<xsl:apply-templates select="MyItems/Item[not(Order/obj/Number)]" />
</xsl:copy>
</xsl:template>
<xsl:template select="MyItems/Item">
<xsl:copy-of select="current()" />
</xsl:template>
Talking about "null nodes" isn't helpful. It's not a well-defined term. Show us your XML, your desired results and your actual results, and we can help you.
What should happen is that if the select expression in xsl:sort returns an empty sequence/node-set, the effective sort key is a zero-length string, so these items sort before any others (assuming ascending order).

In what order do templates in an XSLT document execute, and do they match on the source XML or the buffered output?

Here is something that has always mystified me about XSLT:
In what order do the templates execute, and
When they execute, do they match on (a) the original source XML, or (b) the current output of the XSLT to that point?
Example:
<person>
<firstName>Deane</firstName>
<lastName>Barker</lastName>
</person>
Here is a fragment of XSLT:
<!-- Template #1 -->
<xsl:template match="/">
<xsl:value-of select="firstName"/> <xsl:value-of select="lastName"/>
</xsl:template>
<!-- Template #2 -->
<xsl:template match="/person/firstName">
First Name: <xsl:value-of select="firstName"/>
</xsl:template>
Two questions about this:
I am assuming that Template #1 will execute first. I don't know why I assume this -- is it just because it appears first in the document?
Will Template #2 execute? It matches a node in the source XML, but by the time the we get to this template (assuming it runs second), the "firstName" node will not be in the output tree.
So, are "later" templates beholden to what has occurred in "earlier" templates, or do they operate on the source document, oblivious to what has been transformed "prior" to them? (All those words are in quotes, because I find it hard to discuss time-based issues when I really have little idea how template order is determined in the first place...)
In the above example, we have a template that matches on the root node ("/") that -- when it is done executing -- has essentially removed all nodes from the output. This being the case, would this pre-empt all other templates from executing since there is nothing to match on after that first template is complete?
To this point, I've been concerned with later templates not executing because the nodes they have operated on do not appear in the output, but what about the inverse? Can an "earlier" template create a node that a "later" template can do something with?
On the same XML as above, consider this XSL:
<!-- Template #1 -->
<xsl:template match="/">
<fullName>
<xsl:value-of select="firstName"/> <xsl:value-of select="lastName"/>
</fullName>
</xsl:template>
<!-- Template #2 -->
<xsl:template match="//fullName">
Full Name: <xsl:value-of select="."/>
</xsl:template>
Template #1 creates a new node called "fullName". Template #2 matches on that same node. Will Template #2 execute because the "fullName" node exists in the output by the time we get around to Template #2?
I realize that I'm deeply ignorant about the "zen" of XSLT. To date, my stylesheets have consisted of a template matching the root node, then are completely procedural from there. I'm tired of doing this. I would rather actually understand XSLT correctly, hence my question.
I love your question. You're very articulate about what you do not yet understand. You just need something to tie things together. My recommendation is that you read "How XSLT Works", a chapter I wrote to address exactly the questions you're asking. I'd love to hear if it ties things together for you.
Less formally, I'll take a stab at answering each of your questions.
In what order do the templates execute, and
When they execute, do they match on (a) the original source XML, or (b)
the current output of the XSLT to that
point?
At any given point in XSLT processing, there are, in a sense, two contexts, which you identify as (a) and (b): where you are in the source tree, and where you are in the result tree. Where you are in the source tree is called the current node. It can change and jump all around the source tree, as you choose arbitrary sets of nodes to process using XPath. However, conceptually, you never "jump around" the result tree in the same way. The XSLT processor constructs it in an orderly fashion; first it creates the root node of the result tree; then it adds children, building the result in document order (depth-first). [Your post motivates me to pick up my software visualization for XSLT experiments again...]
The order of template rules in a stylesheet never matters. You can't tell, just by looking at the stylesheet, in what order the template rules will be instantiated, how many times a rule will be instantiated, or even whether it will be at all. (match="/" is an exception; you can always know that it will get triggered.)
I am assuming that Template #1 will
execute first. I don't know why I
assume this -- is it just because it
appears first in the document?
Nope. It would be called first even if you put it last in the document. Template rule order never matters (except under an error condition when you have more than one template rule with the same priority matching the same node; even then, it's optional for the implementor and you should never rely on such behavior). It gets called first because the first thing that always happens whenever you run an XSLT processor is a virtual call to <xsl:apply-templates select="/"/> . The one virtual call constructs the entire result tree. Nothing happens outside it. You get to customize, or "configure", the behavior of that instruction by defining template rules.
Will Template #2 execute? It matches a node in the source XML, but
by the time the we get to this
template (assuming it runs second),
the "firstName" node will not be in
the output tree.
Template #2 (nor any other template rules) will never get triggered unless you have an <xsl:apply-templates/> call somewhere in the match="/" rule. If you don't have any, then no template rules other than match="/" will get triggered. Think of it this way: for a template rule to get triggered, it can't just match a node in the input. It has to match a node that you elect to process (using <xsl:apply-templates/>). Conversely, it will continue to match the node as many times as you choose to process it.
Would [the match="/"
template] pre-empt all other templates
from executing since there is nothing
to match on after that first template
is complete?
That rule preempts the rest by nowhere including <xsl:apply-templates/> in it. There are still plenty of nodes that could be processed in the source tree. They're always all there, ripe for the picking; process each one as many times as you want. But the only way to process them using template rules is to call <xsl:apply-templates/>.
To this point, I've been concerned
with later templates not executing
because the nodes they have operated
on do not appear in the output, but
what about the inverse? Can an
"earlier" template create a node that
a "later" template can do something
with?
It's not that an "earlier" template creates a new node to be processed; it's that an "earlier" template in turn processes more nodes from the source tree, using that same instruction (<xsl:apply-templates). You can think of it as calling the same "function" recursively, with different parameters each time (the nodes to process as determined by the context and the select attribute).
In the end, what you get is a tree-structured stack of recursive calls to the same "function" (<xsl:apply-templates>). And this tree structure is isomorphic to your actual result. Not everyone realizes this or has thought about it this way; that's because we don't have any effective visualization tools...yet.
Template #1 creates a new node called
"fullName". Template #2 matches on
that same node. Will Template #2
execute because the "fullName" node
exists in the output by the time we
get around to Template #2?
Nope. The only way to do a chain of processing is to explicitly set it up that way. Create a variable, e.g., $tempTree, that contains the new <fullName> element and then process it, like this <xsl:apply-templates select="$tempTree">. To do this in XSLT 1.0, you need to wrap the variable reference with an extension function (e.g., exsl:node-set()), but in XSLT 2.0 it will work just as is.
Whether you're processing nodes from the original source tree or in a temporary tree that you construct, either way you need to explicitly say what nodes you want to process.
What we haven't covered is how XSLT gets all its implicit behavior. You must also understand the built-in template rules. I write stylesheets all the time that don't even include an explicit rule for the root node (match="/"). Instead, I rely on the built-in rule for root nodes (apply templates to children), which is the same as the built-in rule for element nodes. Thus I can ignore large parts of the input, let the XSLT processor automatically traverse it, and only when it comes across a node I'm interested in will I do something special. Or I could write a single rule that copies everything recursively (called the identity transform), overriding it only where necessary, to make incremental changes to the input. After you've read "How XSLT Works", your next assignment is to look up the "identity transform".
I realize that I'm deeply ignorant
about the "zen" of XSLT. To date, my
stylesheets have consisted of a
template matching the root node, then
are completely procedural from there.
I'm tired of doing this. I would
rather actually understand XSLT
correctly, hence my question.
I applaud you. Now it's time to take the "red pill": read "How XSLT Works"
Templates always match in the source XML. So the order doesn't really matter, unless 2 or more templates match the same node(s). In that case, somewhat counter-intuitively, the rule with the last matching template is triggered.
In your 1st example Template #1 runs because when you start processing the input xml it begins at the root and that is the only template in your stylesheet that matches the root element. Even if it was 2nd in the stylesheet it would still run 1st.
In this example template 2 will not run as you have already processed the root element using template 1 and there are no more elements to process after the root. If you did want to process other elements using additional templates you should change it to.
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
This then allows you to define a template for each element you are interested in and process the xml in a more logical way, rather than doing it procedurally.
Also note that this example will not output anything as at the current context (the root) there is no firstName element, only a person element so it should be:
<xsl:template match="/">
<xsl:value-of select="person/firstName"/> <xsl:value-of select="person/lastName"/>
</xsl:template>
I find it easier to think that you are stepping through the xml, starting at the root and looking for the template that matches that element then following those instructions to generate teh output. The XSLT transforms the input document to the output so the output doucument is empty at the start of the transformation. The output is not used as part of the transformation it is just the output from it.
In your 2nd example Template #2 will not execute because the template is run against the input xml not the output.
Evan's answer is basically a good one.
However one thing which does seem to be lacking is the ability to "call" up chunks of code without doing any matching. This would - at least in some people's opinion - enable much better structuring.
I have made a small example in an attempt to show what I mean.
<xsl:template match="/" name="dotable">
<!-- Surely the common html part could be placed somewhere else -->
<!-- the head and the opening body -->
<html>
<head><title>Salary table details</title></head>
<body>
<!-- Comments are better than nothing -->
<!-- but that part should really have been somewhere else ... -->
<!-- Now do what we really want here ... this really is making the table! -->
<h1>Salary Table</h1>
<table border = "3" width="80%">
<xsl:for-each select="//entry">
<tr>
<td><xsl:value-of select="name" /></td>
<td><xsl:value-of select="firstname" /></td>
<td><xsl:value-of select="age" /></td>
<td><xsl:value-of select="salary" /></td>
</tr>
</xsl:for-each>
</table>
<!-- Now close out the html -->
</body>
</html>
<!-- this should also really be somewhere else -->
<!-- This approach works, but leads to horribly monolithic code -->
<!-- Further - it leads to templates including code which is strictly -->
<!-- not relevant to them. I've not found a way round this yet -->
</xsl:template>
However, after fiddling around a bit, and at first making use of the hint that if there are two matching templates the last one in the code will be selected, and then restructuring my code (not all shown here), I achieved this which seems to work, and hopefully generates the correct code, as well as displaying the wanted data -
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- <?xml version="1.0"?>-->
<xsl:template name="dohtml">
<html>
<xsl:call-template name="dohead" />
<xsl:call-template name="dobody" />
</html>
</xsl:template>
<xsl:template name="dohead">
<head>
<title>Salary details</title>
</head>
</xsl:template>
<xsl:template name="dobody">
<body>
<xsl:call-template name="dotable" />
</body>
</xsl:template>
<xsl:template match="/entries" name="dotable">
<h1>Salary Table</h1>
<table border = "3" width="80%">
<xsl:for-each select="//entry">
<tr>
<td><xsl:value-of select="name" /></td>
<td><xsl:value-of select="firstname" /></td>
<td><xsl:value-of select="age" /></td>
<td><xsl:value-of select="salary" /></td>
</tr>
</xsl:for-each>
</table>
</xsl:template>
<xsl:template match="/" name="main">
<xsl:call-template name="dohtml" />
</xsl:template>
[Scroll the code above up-down if you can't see it all]
The way this works is the main template always matches - matches on /
This has the chunks of code - templates - which are called.
This now means that it is not possible to match another template on / but it is possible to match
explicitly on a named node, which in this case is the highest level node in the xml - called entries.
A small modification to the code produced the example given above.