XSLT line counter - is it that hard? - xslt

I have cheated every time I've needed to do a line count in XSLT by using JScript, but in this case I can't do that. I simply want to write out a line counter throughout an output file. This basic example has a simple solution:
<xsl:for-each select="Records/Record">
<xsl:value-of select="position()"/>
</xsl:for-each>
Output would be:
1
2
3
4
etc...
But what if the structure is more complex with nested foreach's :
<xsl:for-each select="Records/Record">
<xsl:value-of select="position()"/>
<xsl:for-each select="Records/Record">
<xsl:value-of select="position()"/>
</xsl:for-each>
</xsl:for-each>
Here, the inner foreach would just reset the counter (so you get 1, 1, 2, 3, 2, 1, 2, 3, 1, 2 etc). Does anyone know how I can output the position in the file (ie. a line count)?

While it is quite impossible to mark the line numbers for the serialization of an XML document (because this serialization per se is ambiguous), it is perfectly possible, and easy, to number the lines of regular text.
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:call-template name="numberLines"/>
</xsl:template>
<xsl:template name="numberLines">
<xsl:param name="pLastLineNum" select="0"/>
<xsl:param name="pText" select="."/>
<xsl:if test="string-length($pText)">
<xsl:value-of select="concat($pLastLineNum+1, ' ')"/>
<xsl:value-of select="substring-before($pText, '
')"/>
<xsl:text>
</xsl:text>
<xsl:call-template name="numberLines">
<xsl:with-param name="pLastLineNum"
select="$pLastLineNum+1"/>
<xsl:with-param name="pText"
select="substring-after($pText, '
')"/>
</xsl:call-template>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
when applied on this XML document:
<t>The biggest airlines are imposing "peak travel surcharges"
this summer. In other words, they're going to raise fees
without admitting they're raising fees: Hey, it's not a $30
price hike. It's a surcharge! This comes on the heels of
checked-baggage fees, blanket fees, extra fees for window
and aisle seats, and "snack packs" priced at exorbitant
markups. Hotels in Las Vegas and elsewhere, meanwhile, are
imposing "resort fees" for the use of facilities (in other
words, raising room rates without admitting they're
raising room rates). The chiseling dishonesty of these
tactics rankles, and every one feels like another nail in
the coffin of travel as something liberating and
pleasurable.
</t>
produces the desired line-numbering:
1 The biggest airlines are imposing "peak travel surcharges"
2 this summer. In other words, they're going to raise fees
3 without admitting they're raising fees: Hey, it's not a $30
4 price hike. It's a surcharge! This comes on the heels of
5 checked-baggage fees, blanket fees, extra fees for window
6 and aisle seats, and "snack packs" priced at exorbitant
7 markups. Hotels in Las Vegas and elsewhere, meanwhile, are
8 imposing "resort fees" for the use of facilities (in other
9 words, raising room rates without admitting they're
10 raising room rates). The chiseling dishonesty of these
11 tactics rankles, and every one feels like another nail in
12 the coffin of travel as something liberating and
13 pleasurable.

A line in an XML file is not really the same as an element. In your first example you don't really count the lines - but the number of elements.
An XML file could look like this:
<cheeseCollection>
<cheese country="Cyprus">Gbejna</cheese><cheese>Liptauer</cheese><cheese>Anari</cheese>
</cheeseCollection>
Or the exact same XML file can look like this:
<cheeseCollection>
<cheese
country="Cyprus">Gbejna</cheese>
<cheese>Liptauer</cheese>
<cheese>Anari</cheese>
</cheeseCollection>
which the XSLT will interpet exactly the same - it will not really bother with the line breaks.
Therefore it's hard to show line numbers in the way you want using XSLT - it's not really meant for for that kind of parsing.
Someone correct me if I'm wrong, but I'd say you would need Javascript or some other scripting language to do what you want.

Thanks for the responses guys - yup you're totally correct, some external function is the only way to get this behaviour in XSLT. For those searching, this is how I did this when using a compiled transform in .Net 3.5:
Create a helper class for your function(s)
/// <summary>
/// Provides functional support to XSLT
/// </summary>
public class XslHelper
{
/// <summary>
/// Initialise the line counter value to 1
/// </summary>
Int32 counter = 1;
/// <summary>
/// Increment and return the line count
/// </summary>
/// <returns></returns>
public Int32 IncrementCount()
{
return counter++;
}
}
Add an instance to an args list for XSLT
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(XmlReader.Create(s));
XsltArgumentList xslArg = new XsltArgumentList();
XslHelper helper = new XslHelper();
xslArg.AddExtensionObject("urn:helper", helper);
xslt.Transform(xd.CreateReader(), xslArg, writer);
Use it in you XSLT
Put this in the stylesheet declaration element:
xmlns:helper="urn:helper"
Then use like so:
<xsl:value-of select="helper:IncrementCount()" />

Generally, position() is referring to the number of the current node relative to the entire batch of nodes that is being processed currently.
With your "nested for-each" example, consecutive numbering can easily be achieved when you stop nesting for-each constructs and just select all desired elements at once.
With this XML:
<a><b><c/><c/></b><b><c/></b></a>
a loop construct like this
<xsl:for-each "a/b">
<xsl:value-of select="position()" />
<xsl:for-each "c">
<xsl:value-of select="position()" />
</xsl:for-each>
</xsl:for-each>
will result in
11221
bccbc // referred-to nodes
but you could simply do this instead:
<xsl:for-each "a/b/c">
<xsl:value-of select="position()" />
</xsl:for-each>
and you would get
123
ccc // referred-to nodes

Related

xsl 3.0: How to process certain child elements first in xsl:apply-templates, then the remainder (overriding document order)

Assume my xml input is a MFMATR element with a few child elements, such as: TRLIST, INTRO, and SBLIST -- in that document order. I am converting to HTML.
I have a template that matches on the MFMATR element, and wants to run xsl:apply-templates on the 3 child elements, but I want INTRO to be processed first (listed first in the HTML). The other two (TRLIST and SBLIST) should keep their relative document order, as long as INTRO comes before both of them.
So I'd like to run <xsl:apply-templates select="INTRO, *"> but not have INTRO matched twice. (Using this syntax with xsl 3.0 causes dupes for me.) I also don't want to explicitly list every tag in the select expression, so unknown tags will still be processed.
A 2nd real life example is this: <xsl:apply-templates select="TITLE, CHGDESC, *"/>. Again, right now that is causing dupes I don't want.
I am using Saxon.
So I'd like to run <xsl:apply-templates select="INTRO, *"> but not have INTRO matched twice
Try:
<xsl:apply-templates select="INTRO, * except INTRO">
This seems to work. If someone has a better answer, let me know and I will change it.
There is no DRY violation here -- no repeated element names or variable names. I want it to look clean at all the call sites I will have.
It seems idiomatic to me since the function was pulled from w3's own website!
<xsl:template match="MFMATR">
<!-- Process INTRO first, no matter where it appears -->
<xsl:variable name="nodes" select="INTRO, *"/>
<xsl:apply-templates select="kp:distinct_nodes_stable($nodes)"/>
</xsl:template>
<xsl:template match="INTRO">
<xsl:variable name="nodes" select="TITLE, CHGDESC, *"/>
<xsl:apply-templates select="kp:distinct_nodes_stable($nodes)"/>
</xsl:template>
<!-- Discard duplicate elements in $seq, but keep their ordering -->
<!-- Adapted from https://www.w3.org/TR/xpath-functions/#func-distinct-nodes-stable -->
<xsl:function name="kp:distinct_nodes_stable" as="node()*">
<xsl:param name="seq" as="node()*"/>
<xsl:sequence select="fold-left($seq, (),
function($foundSoFar as node()*, $this as node()) as node()* {
if ($foundSoFar intersect $this)
then $foundSoFar
else ($foundSoFar, $this)
}) "/>
</xsl:function>

Looping through an xpath getting the right iteration each time in xslt 2

I have a condition where I need to loop atleast once and so I have the following xsl code. However, this doesnt work as it always gets the last iterations value. How can I tweak this so it gets the right iteration on each loop?
<xsl:variable name='count0'>
<xsl:choose>
<xsl:when test='count($_BoolCheck/BoolCheck[1]/CheckBoolType) = 0'>
<xsl:value-of select="1"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select='count($_BoolCheck/BoolCheck[1]/CheckBoolType)'/>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<xsl:for-each select="1 to $count0">
<xsl:variable name='_LoopVar_2_0' select='$_BoolCheck/BoolCheck[1]/CheckBoolType[position()=$count0]'/>
<e>
<xsl:attribute name="n">ValueIsTrue</xsl:attribute>
<xsl:attribute name="m">f</xsl:attribute>
<xsl:attribute name="d">f</xsl:attribute>
<xsl:if test="(ctvf:isTrue($_LoopVar_2_0/CheckBoolType[1]))">
<xsl:value-of select=""Value True""/>
</xsl:if>
</e>
</xsl:for-each>
The xml file is as follows:
<BoolCheck>
<CheckBoolType>true</CheckBoolType>
<CheckBoolType>false</CheckBoolType>
<CheckBoolType>1</CheckBoolType>
<CheckBoolType>0</CheckBoolType>
<CheckBoolType>True</CheckBoolType>
<CheckBoolType>False</CheckBoolType>
<CheckBoolType>TRUE</CheckBoolType>
<CheckBoolType>FALSE</CheckBoolType>
</BoolCheck>
In this case I need to iterate through each iteration of CheckBoolType and produce a corresponding number of values. However, in the above example if there were no CheckBoolType iterations I would still like the iterations to enter the for-each loop atleast once. i hope that clarifies it a little more.
First observation: your declaration of $count0 can be replaced by
<xsl:variable name="temp" select="count($_BoolCheck/BoolCheck[1]/CheckBoolType)"/>
<xsl:variable name="count0" select="if ($temp=0) then 1 else $temp"/>
(Sorry if that seems irrelevant, but my first step in debugging code is always to simplify it. It makes the bugs much easier to find).
When you do this you can safely replace the predicate [position()=$count0] by [$count0], because $count0 is now an integer rather than a document node. (Even better, declare it as an integer using as='xs:integer' on the xsl:variable declaration.)
But hang on, $count0 is the number of elements being processed, so CheckBoolType[$count] will always select the last one. That's surely not what you want.
This brings us to another bug in your code. The value of the variable $_LoopVar_2_0 is an element node named CheckBoolType. The expression $_LoopVar_2_0/CheckBoolType[1] is looking for children of this element that are also named CheckBoolType. There are no such children, so the expression selects an empty sequence, so the boolean test is always false.
At this stage I would like to show you some correct code to achieve your desired output. Unfortunately you haven't shown us the desired output. I can't reverse engineer the requirement from (a) your incorrect code, and (b) your prose description of the algorithm you are trying to implement.

Inconsistency in nodeset test?

I have an XML with top-level elements in this vein:
<chapter template="one"/>
<chapter template="two"/>
<chapter template="one"/>
<chapter template="one"/>
<chapter template="two"/>
<chapter template="one"/>
I'm processing these elements by looping through them with a choose statement:
<xsl:variable name="layout" select="#template"/>
<xsl:choose>
<xsl:when test="contains($layout, 'one')">
<xsl:call-template name="processChapterOne"/>
</xsl:when>
<xsl:when test="contains($layout, 'two')">
<xsl:call-template name="processChaptertwo"/>
</xsl:when>
<xsl:otherwise/>
</xsl:choose>
This works correctly. But now I'm trying to do some conditional processing, so I'm trying to find the first chapter in the list:
<xsl:when test="count(preceding-sibling::*[($layout = 'one')]) = '0'">
<xsl:call-template name="processChapterOne"/>
</xsl:when>
Here's when things get weird. My test never becomes true: the value of count(...) is 4 for the first chapter in the list, and increments from there. It looks like it counts all of the top-level elements, and not just the ones named 'chapter'.
When I change the code to this:
<xsl:when test="count(preceding-sibling::*[(#template = 'one')]) = '0'">
<xsl:call-template name="processChapterOne"/>
</xsl:when>
it works correctly. So I've replaced a variable with a direct reference. I can't figure out why this would make a difference. What could cause this?
The not working and working cases are actually very different:
Not working: In preceding-sibling::*[$layout = 'one'], $layout is always the same value of one as it was when originally set in the <xsl:variable name="layout" select="#template"/> statement.
Working: In preceding-sibling::*[#template = 'one'], #template varies per the #template attribute value of the varying preceding-sibling context nodes.
*[(#template = 'one')]
Above means: count all nodes where attribute template equals the text one.
*[($layout = 'one')]
Above means: count all nodes where variable layout equals the text one.
I think with the question you raised $layout is not filled with the text one, but it does a xsl:call-template. Maybe something is going wrong here?
Besides that if you don't want to count all nodes but only the chapter nodes. Do this:
chapter[($layout = 'one')]
chapter[(#template = 'one')]

Select substrings dependent on context

I have this piece of XML and want to get the number immediately after Chapter
<para>Insolvency Rules, r 12.12, which gives the court a broad discretion, unfettered by the English equivalent of the heads of Order 11, r 1(1) (which are now to be found, in England, in CPR, Chapter 6, disapplied in the insolvency context by Insolvency Rules, r 12.12(1)). </para>
When I used this XSLT transform
<xsl:value-of select="translate(substring-after(current(),'Chapter'), translate(substring-after(current(),'Chapter'),'0123456789',''), '')"/>
I get this output
612121
Butut I want just 6.
Please let me know how I should do it.
I don't want to use a statement like
<xsl:value-of select="substring-before(substring-after(current(),'Chapter'), ,',')"/>
as the chapter number will be different in each instance, between 1 and 15.
Try this:
<xsl:variable name="vS" select="concat(substring-after(current(),'Chapter '),'Z')"/>
<xsl:value-of select=
"substring-before(translate($vS,translate($vS,'0123456789',''),'Z'),'Z')"/>
This is based on: https://stackoverflow.com/a/4188249/2115381 Thanks to
#Dimitre Novatchev
Update: If the quantity of space after the "Chapter" is not known you can use something like this:
<xsl:variable name="vS" select="concat(substring-after(current(),'Chapter'),'Z')"/>
<xsl:value-of select=
" translate(
substring-before(translate($vS,translate($vS,' 0123456789',''),'Z'),'Z')
, ' ','')"/>

XSLT xsl:sequence. What is it good for..?

I know the following question is a little bit of beginners but I need your help to understand a basic concept.
I would like to say first that I'm a XSLT programmer for 3 years and yet there are some new and quite basics things I've been learning here I never knew (In my job anyone learns how to program alone, there is no course involved).
My question is:
What is the usage of xsl:sequence?
I have been using xsl:copy-of in order to copy node as is, xsl:apply-templates in order to modifiy nodes I selected and value-of for simple text.
I never had the necessity using xsl:sequence. I would appreciate if someone can show me an example of xsl:sequence usage which is preferred or cannot be achieved without the ones I noted above.
One more thing, I have read about the xsl:sequence definition of course, but I couldn't infer how it is useful.
<xsl:sequence> on an atomic value (or sequence of atomic values) is the same as <xsl:copy-of> both just return a copy of their input. The difference comes when you consider nodes.
If $n is a single element node, eg as defined by something like
<xsl:variable name="n" select="/html"/>
Then
<xsl:copy-of select="$n"/>
Returns a copy of the node, it has the same name and child structure but it is a new node with a new identity (and no parent).
<xsl:sequence select="$n"/>
Returns the node $n, The node returned has the same parent as $n and is equal to it by the is Xpath operator.
The difference is almost entirely masked in traditional (XSLT 1 style) template usage as you never get access to the result of either operation the result of the constructor is implicitly copied to the output tree so the fact that xsl:sequence doesn't make a copy is masked.
<xsl:template match="a">
<x>
<xsl:sequence select="$n"/>
</x>
</xsl:template>
is the same as
<xsl:template match="a">
<x>
<xsl:copy-of select="$n"/>
</x>
</xsl:template>
Both make a new element node and copy the result of the content as children of the new node x.
However the difference is quickly seen if you use functions.
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:f="data:,f">
<xsl:variable name="s">
<x>hello</x>
</xsl:variable>
<xsl:template name="main">
::
:: <xsl:value-of select="$s/x is f:s($s/x)"/>
:: <xsl:value-of select="$s/x is f:c($s/x)"/>
::
:: <xsl:value-of select="count(f:s($s/x)/..)"/>
:: <xsl:value-of select="count(f:c($s/x)/..)"/>
::
</xsl:template>
<xsl:function name="f:s">
<xsl:param name="x"/>
<xsl:sequence select="$x"/>
</xsl:function>
<xsl:function name="f:c">
<xsl:param name="x"/>
<xsl:copy-of select="$x"/>
</xsl:function>
</xsl:stylesheet>
Produces
$ saxon9 -it main seq.xsl
<?xml version="1.0" encoding="UTF-8"?>
::
:: true
:: false
::
:: 1
:: 0
::
Here the results of xsl:sequence and xsl:copy-of are radically different.
The most common use case for xsl:sequence is to return a result from xsl:function.
<xsl:function name="f:get-customers">
<xsl:sequence select="$input-doc//customer"/>
</xsl:function>
But it can also be handy in other contexts, for example
<xsl:variable name="x" as="element()*">
<xsl:choose>
<xsl:when test="$something">
<xsl:sequence select="//customer"/>
</xsl:when>
<xsl:otherwise>
<xsl:sequence select="//supplier"/>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
The key thing here is that it returns references to the original nodes, it doesn't make new copies.
Well to return a value of a certain type you use xsl:sequence as xsl:value-of despite its name always creates a text node (since XSLT 1.0).
So in a function body you use
<xsl:sequence select="42"/>
to return an xs:integer value, you would use
<xsl:sequence select="'foo'"/>
to return an xs:string value and
<xsl:sequence select="xs:date('2013-01-16')"/>
to return an xs:date value and so on. Of course you can also return sequences with e.g. <xsl:sequence select="1, 2, 3"/>.
You wouldn't want to create a text node or even an element node in these cases in my view as it is inefficient.
So that is my take, with the new schema based type system of XSLT and XPath 2.0 a way is needed to return or pass around values of these types and a new construct was needed.
[edit]Michael Kay says in his "XSLT 2.0 and XPath 2.0 programmer's reference" about xsl:sequence: "This innocent looking instruction introduced in XSLT 2.0 has far reaching effects on the capability of the XSLT language, because it means that XSLT instructions and sequence constructors (and hence functions and templates) become capable of returning any value allowed by the XPath data model. Without it, XSLT instructions could only be used to create new nodes in a result tree, but with it, they can also return atomic values and references to existing nodes.".
Another use is to create a tag only if it has a child. An example is required :
<a>
<b>node b</b>
<c>node c</c>
</a>
Somewhere in your XSLT :
<xsl:variable name="foo">
<xsl:if select="b"><d>Got a "b" node</d></xsl:if>
<xsl:if select="c"><d>Got a "c" node</d></xsl:if>
</xsl:variable>
<xsl:if test="$foo/node()">
<wrapper><xsl:sequence select="$foo"/></wrapper>
</xsl:if>
You may see the demo here : http://xsltransform.net/eiZQaFz
It is way better than testing each tag like this :
<xsl:if test="a|b">...</xsl:if>
Because you would end up editing it in two places. Also the processing speed would depend on which tags are in your imput. If it is the last one from your test, the engine will test the presence of everyone before. As $foo/node() is an idioms for "is there a child element ?", the engine can optimize it. Doing so, you ease the life of everyone.