XSLT: Understanding context of position() within 'match' predicate - xslt

It seems that the position() value within <xsl:template match is not the same as within the template. Here is an example (can be also be viewed here):
XML:
<?xml version="1.0" encoding="utf-8" ?>
<section>
<h1>Header 1</h1>
<h1>Header <i>2</i></h1>
</section>
XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
exclude-result-prefixes="#all"
version="3.0">
<xsl:template match="/">
<html>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match="section">
<xsl:apply-templates select="h1[i]"/>
</xsl:template>
<xsl:template match="h1[position() ne 1 and position() eq last()]">
<h1>
<xsl:apply-templates/>
</h1>
<dev>
cond: <xsl:value-of select="position() ne 1 and position() eq last()"/>
</dev>
<dev>pos: <xsl:value-of select="position()"/></dev>
</xsl:template>
</xsl:stylesheet>
Result:
<html>
<body>
<h1>Header 2</h1>
<dev>
cond: false
</dev>
<dev>pos: 1</dev>
</body>
</html>
The same condition within the match predicate, evaluates to true within the template. The position() value is 1 as expected within the template, but seems to have a different value in the match predicate. Can someone help me understand this, please? How can I avoid match the second h1 with the same conditions?

Within a match pattern, you can only use position() within a predicate, and it then refers to the position of a node within the sequence of things tested by the predicate. This will always depend only on the position of the node within the containing tree (usually, its position relative to its siblings).
As a free-standing XPath expression, position() refers to the position of a node within the sequence of nodes being processed by the current call on (typically) xsl:for-each or xsl:apply-templates. This has no necessary relationship to the position of the node in its containing tree.

It might help to look at the spec, for the position() function that is https://www.w3.org/TR/xpath-functions/#func-position and says
Returns the context position from the dynamic context. (See Section
C.2 Dynamic Context Components XP31.)
https://www.w3.org/TR/xpath-31/#id-xp-evaluation-context-components says "Context position dynamic; changes during evaluation of path expressions and predicates".
Furthermore, https://www.w3.org/TR/xslt-30/#focus explains that instructions like apply-templates or for-each change the focus consisting of context item, context position and context size.
So based on that it shouldn't come as a surprise that position() inside a predicate of a match pattern and inside of a template body are not giving the same value.
Pattern matching is explained in detail in https://www.w3.org/TR/xslt-30/#patterns with examples like "para[1] matches any para element that is the first para child element of its parent." should make that clear.
The further text than says "The formal definition, however, is useful for understanding the meaning of a pattern such as para[1]. This matches any node selected by the expression root(.)//(child-or-top::para[1]): that is, any para element that is the first para child of its parent, or a para element that has no parent.".

Related

XSLT Getting attribute from root node in template

how can I get a "locale" attribute from urlset tag inside in template, tried many variants but nothing works from it...
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="https://kazik.reved-rfs.ru/sitemap.xsl"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml" locale="hu-hu">
<xsl:template match="sitemap:urlset">
...
<td>
<xsl:value-of select="urlset/#locale"/>
</td>
Inside the template for sitemap:urlset if you use an XPath like urlset/#locale it will be looking for a urlset element that is a child of the sitemap:urlset element, and then looking for the #locale on that element.
In order to address the #locale element for the matched sitemap:urlset element, use:
<xsl:value-of select="#locale"/>
You could also use an explicit XPath that "jumps up" to the top of the tree and selects from that element (but since you are already "standing on" that matched element, I'd use the relative XPath above):
<xsl:value-of select="/sitemap:urlset/#locale"/>

xslt text filter for subset of certain node

Suppose the following xhtml
...
<a>
skipcontent_1
<div>
skipcontent_2
</div>
<div id='info'>
showcontent_3
<div>
showcontent_4
</div>
<c>
skipcontent_5
</c>
</div>
</a>
...
I would like to have only the text from those div elements, which have id='info' or are below of such a div
So the result should look like this
showcontent_3
showcontent_4
(ident/newlines are not important)
I tried hard to have an identity transformation acting like this, but already failed by letting it ignore the nodes/content "outside" of a <div id='info'/> section...
It sounds to me like your problem is a better fit for a stylesheet without an identity template. This stylesheet satisfies your requirements given that input:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="text()" priority="1">
<!-- Do not output text normally -->
</xsl:template>
<xsl:template match="div/text()[ancestor::div[#id = 'info']]" priority="2">
<xsl:copy/>
</xsl:template>
</xsl:stylesheet>
To deconstruct a bit:
Generally, you want to output nothing. That's the reason for the first template, since default behavior in XSLT for a text node is to copy it to output.
You do want certain text nodes to be copied to output, so the second template handles those. The match expression finds text nodes that satisfy two things: (1) that they are a text node that is a child of a div element (div/text()) and (2) that they are somewhere beneath a div element that has the #id attribute set to "info" ([ancestor::div[#id = 'info']]).
(The second template's match expression could be (more or less) equivalently written as div[#id = 'info']//text()[parent::div].)

XSLT: Match first descendant element

I'm trying to match the first bar element that occurs as a descendant of a foo element in an xsl match pattern and struggling. Initial attempt:
<xsl:template match="//foo//bar[1]">
...
</xsl:template>
fails because there are several bar elements that match. So:
<xsl:template match="(//foo//bar)[1]">
...
</xsl:template>
but that fails to compile.
Tricky. I don't know how efficient or otherwise this would be, but you could turn the pattern on its head and move the logic into a predicate (which is allowed to use axes other than child, attribute and //):
<xsl:template match="foo//bar[not(preceding::bar/ancestor::foo)]">
(any bar inside a foo provided there isn't another bar-inside-a-foo before it). Alternatively you could try a key trick similar to the way Muenchian grouping works, which may be more efficient
<!-- trick key - all matching nodes will end up with the same key value - all
we care about is finding whether a particular node is the first such node
in the document or not. -->
<xsl:key name="fooBar" match="foo//bar" use="1" />
<xsl:template match="foo//bar[generate-id() = generate-id(key('fooBar', 1)[1])]">
You cannot do this with match expressions. In fact, you can do this with match expressions, just not in every XSLT processor, as it seems. See comments.
I'd use an <xsl:if>.
<xsl:template match="foo//bar">
<xsl:if test="generate-id() = generate-id(ancestor::foo[1]//bar)">
<!-- ... -->
</xsl:if>
</xsl:template>
This ensures that only the first descendant <bar> per <foo> (!) is processed any further.
NB: When given a node-set, generate-id() returns the ID of the first node in the set.
Alternative solution is based on the "rule of thumb" use advanced XPATH with "select" not when you "match". Match XML with templates simply by name like foo, not even //foo.
<?xml version="1.0" encoding="UTF-8"?>
<foo>
<bar>bar1<bar>bar2</bar></bar>
<bar>bar3</bar>
</foo>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="foo">
<xsl:apply-templates select="descendant::bar[1]"/>
</xsl:template>
<xsl:template match="bar">
<!--only the first bar was selected --><xsl:value-of select="text()"/>
</xsl:template>
</xsl:stylesheet>

XSLT: need alternative to document()-function for multi-source processing

I'm adapting an XSLT from a third party which transforms an arbitrary number of XMLs into a single HTML document. It's a pretty complex script and it will be revised in the future, so I'm trying to do a minimal adaptation in order to get it to work for our needs.
The following is a stripped down version of the XSLT (containing the essentials):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml">
<xsl:output method="text" encoding="UTF-8" omit-xml-declaration="yes"/>
<xsl:param name="files" select="document('files.xml')//File"/>
<xsl:param name="root" select="document($files)"/>
<xsl:template match="/">
<xsl:for-each select="$root/RootNode">
<xsl:apply-templates select="."/>
</xsl:for-each>
</xsl:template>
<xsl:template match="RootNode">
<xsl:for-each select="//Node">
<xsl:text>Node: </xsl:text><xsl:value-of select="."/><xsl:text>, </xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Now files.xml contains a list of all the URLs of the files to be included (in this case the local files file1.xml and file2.xml). Because we want to read XMLs from memory rather than from disk, and because the invocation of the XSLT only allows for a single XML source, I have combined the two files in a single XML document. The following is a combination of two files (there may be more in a real situation)
<?xml version="1.0" encoding="UTF-8"?>
<TempNode>
<RootNode>
<Node>1</Node>
<Node>2</Node>
</RootNode>
<RootNode>
<Node>3</Node>
<Node>4</Node>
</RootNode>
</TempNode>
where the first RootNode originally resided in file1.xml and the second in file2.xml.
Due to the complexity of the actual XSLT, I've figured that my best shot is to try to alter the $root-param. I've tried the following:
<xsl:param name="root" select="/TempNode"/>
The problem is this. In the case of <xsl:param name="root" select="document($files)"/>, the XPath expression "//Node" in <xsl:for-each select="//Node"> selects the Node's from file1.xml and file2.xml independently, i.e. producing the following (desired) list:
Node: 1, Node: 2, Node: 3, Node: 4,
However, when I combine the content of the two files into a single XML and parse this (and use the suggested $root-definition), the expression "//Node" will select all Node's that are children of the TempNode. (In other words, the desired list, as represented above, is produced twice due to the combination with the outer <xsl:for-each select="$root/RootNode"> loop.)
(A side note: as observed in comment a) in this page, document() apparently changes the root node, perhaps explaining this behavior.)
My question becomes:
How can I re-define $root, using the combined XML as source instead of a multi-source through document(), so that the list is only produced once, without touching the remainder of the XSLT? It's like if $root defined using the document()-function, there is no common root node in the param. Is it possible to define a param with two "separate" node trees?
Btw: I've tried defining a document like this
<xsl:param name="root">
<xsl:for-each select="/TempNode/*">
<xsl:document>
<xsl:copy-of select="."/>
</xsl:document>
</xsl:for-each>
</xsl:param>
thinking it might solve the problem, but the "//Node" expression still fetches all the Nodes. Is the context node in the <xsl:template match="RootNode">-template actually somewhere in the input document and not the param? (Honestly, I'm pretty confused when it comes to context nodes.)
Thanks in advance!
(Updated more)
OK, some of the problem is becoming clear. First, just to make sure I understand, you aren't actually passing parameters for $files and $root to the XSLT processor invocation, right? (They might as well be variables rather than params?)
Now to the main issues... In XPath, when you evaluate an expression that begins with "/" (including "//"), the context node is ignored [mostly]. Therefore, when you have
<xsl:template match="RootNode">
<xsl:for-each select="//Node">
the matched RootNode is ignored. Maybe you wanted
<xsl:template match="RootNode">
<xsl:for-each select=".//Node">
in which the for-each would select Node elements that are descendants of the matched RootNode? This would fix your problem of generating the desired node list twice.
I inserted [mostly] above because I recalled that an "absolute location path" starts from "the root node of the document containing the context node". So the context node does affect what document is used for "//Node". Maybe that's what you intended all along? I guess I was slow to catch on to that.
(A side note: as observed in comment
a) in this page, document() apparently
changes the root node, perhaps
explaining this behavior.)
Or more precisely,
An absolute location path ["/..."]
followed by a relative location
path... selects the set of nodes that
would be selected by the relative
location path relative to the root
node of the document containing the
context node.
document() doesn't actually change anything, in the sense of side effects; rather, it returns a set of nodes contained (usually) by different documents than the primary source document. XSLT instructions like xsl:apply-templates and xsl:for-each establish new values for the context node inside the scope of their template bodies. So if you use xsl:apply-templates and xsl:for-each with select="document(...)/...", the context node inside the scope of those instructions will belong to an external document, so any use of "/..." as an XPath will start from that external document.
Updated again
How can I re-define $root, using the
combined XML as source instead of a
multi-source through document(), so
that the list is only produced once,
without touching the remainder of the
XSLT?
As #Alej hinted, it's really not possible given the above constraint. If you're selecting "//Node" in each iteration of the loop over "$root/RootNode", then in order for each iteration not to select the same nodes as the other iterations, each value of "$root/RootNode" must be in a different document. Since you're using the combined XML source, instead of a multi-source, this is not possible.
But if you don't insist that your <xsl:for-each select="//..."> XPath expression cannot change, it becomes very easy. :-) Just put a "." before the "//".
It's like if $root defined using the document()-function, there is no common root node
in the param.
The value of the param is a node-set. All nodes in the set may be contained in the same document, or they may not, depending on whether the first argument to document() is a nodeset or just a single node.
Is it possible to define a param with two "separate" node trees?
I believe by "separate", you mean "belonging to different documents"? Yes it is, but I don't think you can do it in XSLT 1.0 unless you're selecting nodes that belong to different documents in the first place.
You mentioned trying
<xsl:param name="root">
<xsl:for-each select="/TempNode/*">
<xsl:document>
<xsl:copy-of select="."/>
</xsl:document>
</xsl:for-each>
</xsl:param>
but <xsl:document> is not defined in XSLT 1.0, and your stylesheet says version="1.0". Do you have XSLT 2.0 available? If so, let us know and we can pursue this option. To be honest, <xsl:document> is not familiar territory for me. But I'm happy to learn along with you.
You can apply only nodes you need:
Input:
<?xml version="1.0" encoding="UTF-8"?>
<TempNode>
<RootNode>
<Node>1</Node>
<Node>2</Node>
</RootNode>
<RootNode>
<Node>3</Node>
<Node>4</Node>
</RootNode>
</TempNode>
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
<xsl:output method="html" indent="yes"/>
<xsl:template match="/">
<xsl:copy>
<xsl:apply-templates select="TempNode/RootNode"/>
</xsl:copy>
</xsl:template>
<xsl:template match="RootNode">
<xsl:value-of select="concat('RootNode-', generate-id(.), '
')"/>
<xsl:apply-templates select="Node"/>
</xsl:template>
<xsl:template match="Node">
<xsl:value-of select="concat('Node', ., '
')"/>
</xsl:template>
</xsl:stylesheet>
Output:
RootNode-N65540
Node1
Node2
RootNode-N65549
Node3
Node4

How to select these elements with Xpath?

I have a document, something like this:
<root>
<A node="1"/>
<B node="2"/>
<A node="3"/>
<A node="4"/>
<B node="5"/>
<B node="6"/>
<A node="7"/>
<A node="8"/>
<B node="9"/>
</root>
Using xpath, How can I select all B elements that consecutively follow a given A element?
It's something like following-silbing::B, except I want them to be only the immediately following elements.
If I am on A (node==1), then I want to select node 2.
If I am on A (node==3), then I want to select nothing.
If I am on A (node==4), then I want to select 5 and 6.
Can I do this in xpath? EDIT: It is within an XSL stylesheet select statement.
EDIT2: I don't want to use the node attribute on the various elements as a unique identifier. I included the node attribute only for purposes of illustrating my point. In the actual XML doc, I don't have an attribute that I use as a unique identifer. The xpath "following-sibling::UL[preceding-sibling::LI[1]/#node = current()/#node]"
keys on the node attribute, and that's not what I want.
Short answer (assuming current() is ok, since this is tagged xslt):
following-sibling::B[preceding-sibling::A[1]/#node = current()/#node]
Example stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml"/>
<xsl:template match="/">
<xsl:apply-templates select="/root/A"/>
</xsl:template>
<xsl:template match="A">
<div>A: <xsl:value-of select="#node"/></div>
<xsl:apply-templates select="following-sibling::B[preceding-sibling::A[1]/#node = current()/#node]"/>
</xsl:template>
<xsl:template match="B">
<div>B: <xsl:value-of select="#node"/></div>
</xsl:template>
</xsl:stylesheet>
Good luck!
While #Chris Nielsen's answer is the right approach, it leaves an uncertainty in cases where the compared attribute is not unique. The more correct way of solving this is:
following-sibling::B[
generate-id(preceding-sibling::A[1]) = generate-id(current())
]
This makes sure that the preceding-sibling::A is identical to the current A, instead of just comparing some attribute values. Unless you have attributes that are guaranteed to be unique, this is the only safe way.
A solution might be to first gather up all the following nodes using following-sibling::*, grab the first of these and require it to be a 'B' node.
following-sibling::*[position()=1][name()='B']