XSLT Rules that brake and symbols that change meaning

XSLT Rules that brake and symbols that change meaning - xslt

These are just some observations born out of frustration while learning XSLT.
Meaning of the symbols changes depending on position and context
Following three lines show the usage of '/' character and in each line that character symbolizes something COMPLETELY different.
In first line represents root node, in second child-parent relationship and in third descendants of people element.
In other words the meaning of the same character has been overridden three times and it completely changes meaning depending on the position and the surrounding context.
<xsl:template match="/">
<xsl:template match="people/person">
<xsl:template match="people//name">
Rules brake when combined
In the following expression '/' means root node.
<xsl:template match="/">.
In the following expression '/' means that person is child of people.
<xsl:template match="people/person">.
Now if you try to combine these two rules to reference people which should be child of the root node you write <xsl:template match="//people"> and get the expected result but only by accident.
That is because '//' has completely new meaning that has nothing to do with root node or child parent relationship.
It means all descendants and in this context it refers to all descendant of the root node and not just the people child.
It makes me wonder
With all these problems plaguing XSLT it makes me wonder who were the people that created it, what exactly did they try to achieve, what were their guidelines and who are indented users.
I am sure that these people are extremely smart since XSLT is extremely complicated.
But I wonder if they were creating this monster only for them selves either deliberately making it so unnecessary complex and confusing in order to erect artificial barrier against everybody else or if they simply couldn't care less about others that might want to use it as long as the thing seems usable to them.

Related

Variable for XPath nodes

I'm starting to learn XSLT/XPath, and I copied the following from a study guide, making some modifications:
<xsl:variable name="fname" select="'polist.xml'"/>
<xsl:variable name="thePath" select="'/collection/doc'"/>
...
<xsl:value-of select="count(doc($fname)/collection/doc)"/>
It reports the number of doc elements in the XML file. The doc() function accepts the file name variable 'fname'. But if I try to do the same with the 'thePath' variable in the count() function, using $thePath instead of the "/collection/doc" text, I get an error.
Suggestions on whether/how to use the 'thePath' variable in the count() function? Is it possible? Thanks!

Learning from examples leaves you very exposed to this kind of problem: it's easy to build a completely incorrect mental model of how the examples actually work. That's why I always advise people to start by reading a good book that explains the concepts first.
In your case you've made a common mistake, which is to assume that variables work like macros, that is, that they represent fragments of XPath text that can be substituted into an expression. That's not the case: variables represent values, the result of evaluating an expression, and you can only use a variable in places where a literal value (like a number or string) could appear.
(I suspect it's the use of the $ sign that leads to this false impression. $ is often used to represent variables in macro-like languages, for example shell scripts).
In XPath 1.0 there's no direct way of achieving what you are trying to do. In practice people either use vendor extensions for this, or they construct a pipeline in which phase 1 generates an XSLT stylesheet and phase 2 executes it (that's easier in XSLT than in most other languages, because XSLT is XML and can therefore be easily manipulated using XSLT).
In 3.0 you can evaluate XPath expressions supplied in the form of a string using the xsl:evaluate instruction. But very often, the requirement can be met better using functions. We don't know what the real underlying requirement is here so it's hard to know whether that's true in this case.

An example use of xsl:evaluate in XSLT 3 would be e.g.
<xsl:evaluate xpath="'count(' || $thePath || ')'" context-item="doc($fname)"/>

What's the rationale behind result tree fragments?

XSLT 1.0 adds an additional data type to those provided by XPath 1.0: result tree fragments.
This additional data type is called result tree fragment. A variable may be bound to a result tree fragment instead of one of the four basic XPath data-types (string, number, boolean, node-set). A result tree fragment represents a fragment of the result tree. A result tree fragment is treated equivalently to a node-set that contains just a single root node. However, the operations permitted on a result tree fragment are a subset of those permitted on a node-set. An operation is permitted on a result tree fragment only if that operation would be permitted on a string (the operation on the string may involve first converting the string to a number or boolean). In particular, it is not permitted to use the /, //, and [] operators on result tree fragments.
— https://www.w3.org/TR/xslt-10/#section-Result-Tree-Fragments
To me, this seems pointless. I cannot understand why anybody would want to do this! Result tree fragments just seem like a rubbish version of node-sets, requiring two intermediate variables and a language extension to allow a programmer to work around this seemingly arbitrary limitation.
To further pile on the uselessness of result tree fragments, here's the compatibility shim I stole put together to replicate exsl:node-set in MSXSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
exclude-result-prefixes="exsl msxsl">
<!-- exsl:node-set -->
<msxsl:script language="JScript" implements-prefix="exsl"><![CDATA[
this['node-set'] = function (x) {
return x;
}
]]></msxsl:script>
</xsl:stylesheet>
This literally just returns the result tree fragment unchanged, suggesting that MSXSL doesn't even bother with implementing result tree fragment as a different type and just treats it identically to a node-set, further suggesting that there's no real point to it in the first place!
Why do result tree fragments exist?
What is the use-case?
Why were they added?
Why not just use a node-set?

I wasn't on the Working Group at the time, but the following exchange might shed some light. In April 2001, during the development of XSLT 1.1, I asked the WG:
Can any one try to explain to me why there is a perceived problem with
the "result tree fragment as node-set" facility as defined in the XSLT
1.1 WD? I keep hearing that it won't work with the XPath 2.0 type system, but I can't see why.
I can see us wanting to change it so that the data type is "node"
rather than "node-set", but apart from that, I fail to see what the
problem is.
Is it perhaps that someone has in mind doing away with the root node
of the temporary tree, and making the value of the variable instead be
the sequence of nodes that are currently modelled as children of this
root? If so, why would that change be useful?
James Clark replied:
Is it perhaps that someone has in mind doing away with the root node of the temporary tree, and making the value of the variable instead be the sequence
of nodes that are currently modelled as children of this root?
Yes.
If so, why would that change be useful?
(a) So instructions can return nodes without copying them.
(b) So that you can use instructions to return things other than
nodes.
Explaining things more than this would require me to explain how I
hope to see XPath, XSLT and XQuery all fitting together. At this
point, let me just say that I think we need to harmonize element
construction in XSLT and XQuery. This will naturally lead to their
being much less of a gulf between expressions and instructions. I
think it will turn out to be just as awkward and inappropriate for
xsl:variable to automagically copy and wrap in a root node the value
produced by instantiating its content as it would be for it to do this
to the value produced by evaluating the expression specified in the
select attribute.
I think the WG invented the concept of "result tree fragments" because they wanted to keep options open for the future. They had ideas how the language would evolve, and they thought that making xsl:variable create a full blown node with full navigation capability would restrict the options for the future.
In retrospect I'm convinced it was a mistake, because it didn't actually achieve this objective. When we abolished RTFs in 2.0, we still found it necessary, for backwards compatibility reasons, to have the bizarre rule that xsl:variable always constructs a document node if there is no "as" attribute.
It's worth noting that no-one in the WG ever imagined that people would still be using XSLT 1.0 twenty years later. 1.0 took about two years to develop and the WG fully expected that within two years, it would be completely superseded by a later version. They were therefore very willing to put restrictions in the language if they kept options open for the next version.

What are the differences between 'call-template' and 'apply-templates' in XSL?

I am new in XSLT so I'm little bit confused about the two tags,
<xsl:apply-templates name="nodes">
and
<xsl:call-template select="nodes">
So can you list out the difference between them?

<xsl:call-template> is a close equivalent to calling a function in a traditional programming language.
You can define functions in XSLT, like this simple one that outputs a string.
<xsl:template name="dosomething">
<xsl:text>A function that does something</xsl:text>
</xsl:template>
This function can be called via <xsl:call-template name="dosomething">.
<xsl:apply-templates> is a little different and in it is the real power of XSLT: It takes any number of XML nodes (whatever you define in the select attribute), processes each of them (not necessarily in any predefined order), somebody could say that apply-templates works like a loop, but this is not exactly the case, as the nodes may be processed in any order, even in parallel, and finds matching templates for them:
<!-- sample XML snippet -->
<xml>
<foo /><bar /><baz />
</xml>
<!-- sample XSLT snippet -->
<xsl:template match="xml">
<xsl:apply-templates select="*" /> <!-- three nodes selected here -->
</xsl:template>
<xsl:template match="foo"> <!-- will be called once -->
<xsl:text>foo element encountered</xsl:text>
</xsl:template>
<xsl:template match="*"> <!-- will be called twice -->
<xsl:text>other element countered</xsl:text>
</xsl:template>
This way you give up a little control to the XSLT processor - not you decide where the program flow goes, but the processor does by finding the most appropriate match for the node it's currently processing.
If multiple templates can match a node, the one with the more specific match expression wins. If more than one matching template with the same specificity exist, the one declared last wins.
You can concentrate more on developing templates and need less time to do "plumbing". Your programs will become more powerful and modularized, less deeply nested and faster (as XSLT processors are optimized for template matching).
A concept to understand with XSLT is that of the "current node". With <xsl:apply-templates> the current node moves on with every iteration, whereas <xsl:call-template> does not change the current node. I.e. the . within a called template refers to the same node as the . in the calling template. This is not the case with apply-templates.
This is the basic difference. There are some other aspects of templates that affect their behavior: Their mode and priority, the fact that templates can have both a name and a match. It also has an impact whether the template has been imported (<xsl:import>) or not. These are advanced uses and you can deal with them when you get there.

To add to the good answer by #Tomalak:
Here are some unmentioned and important differences:
xsl:apply-templates is much richer and deeper than xsl:call-templates and even from xsl:for-each, simply because we don't know what code will be applied on the nodes of
the selection -- in the general case this code will be different for
different nodes of the node-list.
The code that will be applied
can be written way after the xsl:apply templates was written and by
people that do not know the original author.
The FXSL library's implementation of higher-order functions (HOF) in XSLT wouldn't be possible if XSLT didn't have the <xsl:apply-templates> instruction.
Summary: Templates and the <xsl:apply-templates> instruction is how XSLT implements and deals with polymorphism.
Reference: See this whole thread: http://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/200411/msg00546.html

xsl:apply-templates is usually (but not necessarily) used to process all or a subset of children of the current node with all applicable templates. This supports the recursiveness of XSLT application which is matching the (possible) recursiveness of the processed XML.
xsl:call-template on the other hand is much more like a normal function call. You execute exactly one (named) template, usually with one or more parameters.
So I use xsl:apply-templates if I want to intercept the processing of an interesting node and (usually) inject something into the output stream. A typical (simplified) example would be
<xsl:template match="foo">
<bar>
<xsl:apply-templates/>
</bar>
</xsl:template>
whereas with xsl:call-template I typically solve problems like adding the text of some subnodes together, transforming select nodesets into text or other nodesets and the like - anything you would write a specialized, reusable function for.
Edit:
As an additional remark to your specific question text:
<xsl:call-template name="nodes"/>
This calls a template which is named 'nodes':
<xsl:template name="nodes">...</xsl:template>
This is a different semantic than:
<xsl:apply-templates select="nodes"/>
...which applies all templates to all children of your current XML node whose name is 'nodes'.

The functionality is indeed similar (apart from the calling semantics, where call-template requires a name attribute and a corresponding names template).
However, the parser will not execute the same way.
From MSDN:
Unlike <xsl:apply-templates>, <xsl:call-template> does not change the current node or the current node-list.

When the same XML element matches two XSLT templates through different XPaths, which template executes and why?

Consider this XML:
<people>
<person>
<firstName>Deane</firstName>
<lastName>Barker</lastName>
</person>
</people>
What if two XSLT templates match an element through different XPaths? I know that if the "match" element on two templates is identical (which should never happen, I don't think), the last template will fire.
However, consider this XSL:
<xsl:template match="person/firstName">
Template #1
</xsl:template>
<xsl:template match="firstName">
Template #2
</xsl:template>
The "firstName" element will match on either of these templates -- the first one as a child of "person" and the second one standalone.
I have tested this, and Template #1 executes, while Template #2 does not. What is the operative principle behind this? I can think of three things:
Specificity of XPath (highly doubtful)
Location in the XSLT file (also doubtful)
Some pre-emption of Template #2 by Template #1. Something happens during the execution of Template #1 that tells Template #2 not to execute.

Your first point is actually correct, there is a defined order described in https://www.w3.org/TR/1999/REC-xslt-19991116#conflict. According to the spec person/firstName has a priority of 0 while firstName has a priority of -0.5. You can also specify the priority yourself using the priority attribute on xsl:template.

I know that if the "match" element on
two templates is identical (which
should never happen, I don't think)
This can happen but would not be much point doing this and having two matching templates.
From the spec:
It is an error if this leaves more
than one matching template rule. An
XSLT processor may signal the error;
if it does not signal the error, it
must recover by choosing, from amongst
the matching template rules that are
left, the one that occurs last in the
stylesheet.
So in other words you may get an error or it will just use the last template in your XSLT depending on how the processor your are using has been written to handle this situation.

Note that the value of the match attribute is not an XPath expression (though it uses a subset of XPath syntax). It's an XSLT pattern. Absent explicit priority attributes, the choice comes down to which pattern has the highest default priority:
person/firstName has a default priority of .5
firstName has a default priority of 0
Thus, person/firstName wins.
A complete explanation of how conflict resolution works can be found here (although I recommend you study the entire chapter, "How XSLT Works"): Conflict Resolution for Template Rules

Consider this with the context in mind. The first one matches, and changes the context n (so the second does not match). The context is set to AFTER the first one is selected and processed so the visible element from that context no longer contains "firstname".
IF you want both to execute, then you can call them instead so that the context changes back to the top.
<xsl:template match="people">
<xsl:apply-templates select="person/firstname"/>
<xsl:apply-templates select="firstname"/>
</xsl:template>

XSL/XPath Indentation

What conventions (if any) do you use for indenting XSL code?
how do you deal with really long, complicated XPaths
can you plug them into your XML editor of choice?
is there some open source code that does the job well?
For some background, I use nxml-mode in Emacs. For the most part its OK and you can configure the number of spaces that child elements should be indented. Its not very good though when it comes to complicated XPaths. If I have a long XPath in my code, I like to make it's structure as transparent as possible by making it look something like this...
<xsl:for-each select="/some
/very[#test = 'whatever']
/long[#another-test = perhaps
/another
/long
/xpath[#goes='here']]
/xpath"
However, I currently have to do that manually as nxml will just align it all up with the "/some.."

Sometimes a longer xpath can't be avoided, even if you use templates instead of for-eaches (like you should, if you can). This is especially true in XSLT/XPath 2.0:
<xsl:attribute name="tablevel"
select="if (following::*[self::topic | self::part])
then (following::*[self::topic | self::part])[1]/#tablevel
else #tablevel"/>
I tend not to break a "simple" path across lines, but will break the "greater" path at operators or conditionals.
For editing, I use Oxygen (which is cross-platform) and it handles this kind of spacing pretty well. Sometimes it doesn't predict what you want exactly, but it will maintain the space once it's there, even if you re-indent your code.

In my opinion, long xpaths are hard to read and should be avoided. There are 2 ways to do it:
Simplify the source xml.
Split big templates into smaller ones.

Don't use long xpaths. Ditch the for-each and use match templates. Break down the xpath into several templates. It's much easier to read a bunch of trivial match templates than one of these.

I tend to break down the XSL differently if I'm having difficulty reading the xpath statements (which isn't very often, but it happens occasionally)... it's actually rather similar to my methods of breaking up syntax for other languages... So your example in the question might become something more like this:
<xsl:for-each select="/some/very[#test = 'whatever']/long">
<xsl:if test="#another-test = perhaps/another/long/xpath[#goes='here']">
<xsl:for-each select="xpath">
... result xml ....
</xsl:for-each>
</xsl:if>
</xsl:for-each>

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js