XSLT Gouping, for-each and implicated loop. How to eliminate duplicates? - xslt

Hope find a guru's help to figure out the next problem.
I have two xml files. Firts one here (text.xml):
<text>
<ref>Author1, Title1, Date1</ref>
<ref>Author75, Title75, Date2</ref>
<ref>Author2, Title2, Date2</ref>
<ref>Author3, Title3, Date3</ref>
<text>
And the second one like this (list.xml):
<list>
<bibl xml:id="1"><author>Author1</author><date>Date1</date></bibl>
<bibl xml:id="2"><author>Author2</author><date>Date2</date></bibl>
<bibl xml:id="3"><author>Author3</author><date>Date3</date></bibl>
</list>
I want to query text.xml and check against list.xml to add #xml:id (from list.xml) to <ref> (from text.xml) wich contain same Author and Date. If not, then just copy original <ref>.
So I want to obtain:
<ref xml:id="1">Author1, Title1, Date1</ref>
<ref>Author75, Title75, Date2</ref>
<ref xml:id="2>Author2, Title2, Date2</ref>
etc.
My XSLT identify well all correpondence:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="ref">
<xsl:variable name="ref" select="."/>
<xsl:for-each select="document('list.xml')//bibl">
<xsl:variable name="bibl" select="."/>
<xsl:variable name="author" select="author"/>
<xsl:variable name="date" select="date"/>
<xsl:choose>
<xsl:when test="contains($ref, $author) and contains($ref, $date)">
<ref>
<xsl:attribute name="xml:id">
<xsl:value-of select="$bibl/#xml:id"/>
</xsl:attribute>
<xsl:value-of select="$ref"/>
</ref>
</xsl:when>
<xsl:otherwise>
<xsl:copy-of select="$ref"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
But, then there aren't correpondence it's not just copy right <ref>, but copy all <ref> the number of time I have <bibl> nodes in the second file.
So problem is in <xsl:otherwise><xsl:copy-of select="$ref"/></xsl:otherwise>.
Any ideas how I can obtain only this distinct value I need? I know it's must be very simple actually and I try key, generate-id, for-each-group, distinct-values, but can't figure it out.

The problem is that you are creating a ref element for each iteration of the for-each loop whether there is a match or not.
What you need to do in this case is create the ref element outside of the for-each and then only create the id attribute for the matching element inside the loop
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="ref">
<xsl:variable name="ref" select="."/>
<ref>
<xsl:apply-templates select="#* "/>
<xsl:for-each select="document('list.xml')//bibl">
<xsl:variable name="bibl" select="."/>
<xsl:variable name="author" select="author"/>
<xsl:variable name="date" select="date"/>
<xsl:choose>
<xsl:when test="contains($ref, $author) and contains($ref, $date)">
<xsl:attribute name="xml:id">
<xsl:value-of select="$bibl/#xml:id"/>
</xsl:attribute>
</xsl:when>
</xsl:choose>
</xsl:for-each>
<xsl:apply-templates select="node()"/>
</ref>
</xsl:template>
</xsl:stylesheet>
When applied to your sample XML, the following is output
<text>
<ref xml:id="1">Author1, Title1, Date1</ref>
<ref>Author75, Title75, Date2</ref>
<ref xml:id="2">Author2, Title2, Date2</ref>
<ref xml:id="3">Author3, Title3, Date3</ref>
</text>
However, your current method is not very efficient, as for each ref element you are iterating over all bibl elements. Another approach would be to extract the author and date from the ref elements, and then look up the bibl element directly
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="ref">
<xsl:variable name="author" select="normalize-space(substring-before(., ','))"/>
<xsl:variable name="date" select="normalize-space(substring-after(substring-after(., ','), ','))"/>
<ref>
<xsl:apply-templates select="#* "/>
<xsl:apply-templates select="document('list.xml')//bibl[author=$author][date=$date]"/>
<xsl:apply-templates select="node()"/>
</ref>
</xsl:template>
<xsl:template match="bibl">
<xsl:attribute name="xml:id">
<xsl:value-of select="#xml:id"/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
This should also give the same results.

EDIT:
humm...too much focus on xsl syntax, i should have seen that earlier...
you have an implicated outer loop over each ref and an inner loop over each bibl. You generate one element for every bibl for every ref regardless of match or no match.
So, instead of the xsl:otherwise you need a check after the for-each loop to see if there was no match and do the copy-of if neccessary.
Not sure how to do the check, though..maybe using position() and count() of the generated <ref>s, sorry...don't have any more time to think about this right now.
not really an explanation, but a workaround:
<xsl:otherwise>
<ref>
<xsl:value-of select="$ref"/>
</ref>
</xsl:otherwise>
My guess is that the problem lies in $ref, which is not a xpath expression at that moment (if i remember xsl-t correctly)

Related

XSLT – creating a network from all children elements

With XSLT 2.0, I am trying to create a list of relations between all children of given elements, in a document such as:
<doc>
<part1>
<name>John</name>
<name>Paul</name>
<name>George</name>
<name>Ringo</name>
<place>Liverpool</place>
</part1>
<part2>
<name>Romeo</name>
<name>Romeo</name>
<name>Juliet</name>
<fam>Montague</fam>
<fam>Capulet</fam>
</part2>
</doc>
The result I would like to obtain, ideally by conflating and weighing the identical relations, would be (in whatever order) something like:
<doc>
<part1>
<rel><name>John</name><name>Paul</name></rel>
<rel><name>John</name><name>George</name></rel>
<rel><name>John</name><name>Ringo</name></rel>
<rel><name>Paul</name><name>George</name></rel>
<rel><name>Paul</name><name>Ringo</name></rel>
<rel><name>George</name><name>Ringo</name></rel>
<rel><name>John</name><place>Liverpool</place></rel>
<rel><name>Paul</name><place>Liverpool</place></rel>
<rel><name>George</name><place>Liverpool</place></rel>
<rel><name>Ringo</name><place>Liverpool</place></rel>
</part1>
<part2>
<rel weight="2"><name>Romeo</name><name>Juliet</name></rel>
<rel weight="2"><name>Romeo</name><fam>Montague</fam></rel>
<rel weight="2"><name>Romeo</name><fam>Capulet</fam></rel>
<rel><name>Juliet</name><fam>Montague</fam></rel>
<rel><name>Juliet</name><fam>Capulet</fam></rel>
<rel><fam>Montague</fam><fam>Capulet</fam></rel>
</part2>
</doc>
—but I'm not sure how to proceed. Many thanks in advance for your help.
You still haven't explained the logic that needs to be applied here, so this is based largely on a guess:
XSLT 2.0
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="/">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="doc/*">
<!-- first pass-->
<xsl:variable name="unique-items">
<xsl:for-each-group select="*" group-by="concat(name(), '|', .)">
<item name="{name()}" count="{count(current-group())}" value="{.}"/>
</xsl:for-each-group>
</xsl:variable>
<!-- output -->
<xsl:copy>
<xsl:for-each select="$unique-items/item">
<xsl:variable name="left" select="."/>
<xsl:for-each select="following-sibling::item">
<xsl:variable name="weight" select="$left/#count * #count" />
<rel>
<xsl:if test="$weight gt 1">
<xsl:attribute name="weight" select="$weight"/>
</xsl:if>
<xsl:apply-templates select="$left | ." />
</rel>
</xsl:for-each>
</xsl:for-each>
</xsl:copy>
</xsl:template>
<xsl:template match="item">
<xsl:element name="{#name}">
<xsl:value-of select="#value"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
The idea here is to remove duplicates in the first pass, then enumerate all combinations in the second (final) pass. The weight is computed by multiplying the number of occurrences of each member of a combination pair and shown only when it exceeds 1.
At least the combinatoric part of your problem could be solved with the following XSLT script. It does not solve the elimination of duplicates, but that could possibly be done in a second transformation.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- standard copy template -->
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*" />
</xsl:copy>
</xsl:template>
<xsl:template match="doc/*">
<xsl:copy>
<xsl:variable name="l" select="./*"/>
<xsl:for-each select="$l">
<xsl:variable name="a" select="."/>
<xsl:variable name="posa" select="position()"/>
<xsl:variable name="namea" select="name()"/>
<xsl:for-each select="$l">
<xsl:if test="position() > $posa and (. != $a or name() != $namea)">
<rel>
<xsl:copy-of select="$a"/>
<xsl:copy-of select="."/>
</rel>
</xsl:if>
</xsl:for-each>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
When applied to the first part of your example, this produces:
<part1>
<rel><name>John</name><name>Paul</name></rel>
<rel><name>John</name><name>George</name></rel>
<rel><name>John</name><name>Ringo</name></rel>
<rel><name>John</name><place>Liverpool</place></rel>
<rel><name>Paul</name><name>George</name></rel>
<rel><name>Paul</name><name>Ringo</name></rel>
<rel><name>Paul</name><place>Liverpool</place></rel>
<rel><name>George</name><name>Ringo</name></rel>
<rel><name>George</name><place>Liverpool</place></rel>
<rel><name>Ringo</name><place>Liverpool</place></rel>
</part1>
Which seems about correct. If have no idea if the duplicate elimination (or weighting, as you call it) could be done in the same transformation.

use xsl:number incide if

There is list
<nodes>
<node attr='1'/>
<node attr='0'/>
<node attr='1'/>
<node attr='1'/>
</nodes>
i need to apply-templates all nodes and count it:
<xsl:apply-templates select='nodes/node'>
<xsl:if test='#attr=1'>
<xsl:number/>
</xsl:if>
</xsl:apply-templates>
but a haz in result not 123, result is 134. How to fix it in xslt-1.0? There is another way to set numbers to it? position() not help, and
<xsl:apply-templates select='nodes/node[#attr=1]'>
<xsl:if test='#attr=1'>
<xsl:number/>
</xsl:if>
</xsl:apply-templates>
not help to =(((
Firstly, you have an error in your XSLT
<xsl:apply-templates select='nodes/node'>
<xsl:if test='#attr=1'> <xsl:number/>
</xsl:if>
</xsl:apply-templates>
You can't have an xsl:if within an xsl:apply-templates. You need a matching xsl:template and put the code in there...
<xsl:apply-templates select="nodes/node" />
<xsl:template match="node">
<xsl:if test='#attr=1'>
<xsl:number/>
</xsl:if>
<xsl:template>
In fact, you could do away with the xsl:if here, and just have the test in the template match
<xsl:template match="node[#attr=1]">
<xsl:number/>
<xsl:template>
But to answer your question, you probably need to use the count attribute on the xsl:number element to count only the elements you want
<xsl:number count="node[#attr=1]"/>
Here is the full XSLT
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<xsl:apply-templates select="nodes/node"/>
</xsl:template>
<xsl:template match="node[#attr=1]">
<xsl:number count="node[#attr=1]"/>
</xsl:template>
<xsl:template match="node"/>
</xsl:stylesheet>
When applied to you XML, the result is 123
This says 123 - is this what you were after?
<xsl:for-each select="nodes/node[#attr='1']">
<xsl:value-of select="position()"/>
</xsl:for-each>
It is not quite clear what are you trying to achieve. I presume you need to count the number of nodes that have attribute set to 1. In that case, use the count function:
<xsl:value-of select="count(nodes/node[#attr='1'])" />
In case you need to output a position of the desired node inside the subset matching the condition, then for-each will be probably the way to go:
<xsl:for-each select="nodes/node[#attr='1']">
<xsl:value-of select="position()" />
</xsl:for-each>

XSL - How to match consecutive comma-separated tags

I'm trying to match a series of xml tags that are comma separated, and to then apply an xslt transformation on the whole group of nodes plus text. For example, given the following partial XML:
<p>Some text here
<xref id="1">1</xref>,
<xref id="2">2</xref>,
<xref id="3">3</xref>.
</p>
I would like to end up with:
<p>Some text here <sup>1,2,3</sup>.</p>
A much messier alternate would also be acceptable at this point:
<p>Some text here <sup>1</sup><sup>,</sup><sup>2</sup><sup>,</sup><sup>3</sup>.</p>
I have the transformation to go from a single xref to a sup:
<xsl:template match="xref"">
<sup><xsl:apply-templates/></sup>
</xsl:template>
But I'm at a loss as to how to match a group of nodes separated by commas.
Thanks.
Update: Thanks to #Flynn1179 who alerted me that the solution wasn't producing exactly the wanted output, I have slightly modified it. Now the wanted "good" format is produced.
This XSLT 1.0 transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()[1]|#*"/>
</xsl:copy>
<xsl:apply-templates select="following-sibling::node()[1]"/>
</xsl:template>
<xsl:template match=
"xref[not(preceding-sibling::node()[1]
[self::text() and starts-with(.,',')]
)
]">
<xsl:variable name="vBreakText" select=
"following-sibling::text()[not(starts-with(.,','))][1]"/>
<xsl:variable name="vPrecedingTheBreak" select=
"$vBreakText/preceding-sibling::node()"/>
<xsl:variable name="vFollowing" select=
".|following-sibling::node()"/>
<xsl:variable name="vGroup" select=
"$vFollowing[count(.|$vPrecedingTheBreak)
=
count($vPrecedingTheBreak)
]
"/>
<sup>
<xsl:apply-templates select="$vGroup" mode="group"/>
</sup>
<xsl:apply-templates select="$vBreakText"/>
</xsl:template>
<xsl:template match="text()" mode="group">
<xsl:value-of select="normalize-space()"/>
</xsl:template>
</xsl:stylesheet>
when applied on the following XML document (based on the provided one, but made more complex and interesting):
<p>Some text here
<xref id="1">1</xref>,
<xref id="2">2</xref>,
<xref id="3">3</xref>.
<ttt/>
<xref id="4">4</xref>,
<xref id="5">5</xref>,
<xref id="6">6</xref>.
<zzz/>
</p>
produces exactly the wanted, correct result:
<p>Some text here
<sup>1,2,3</sup>.
<ttt/>
<sup>4,5,6</sup>.
<zzz/>
</p>
Explanation:
We use a "fined-grained" identity rule, which processes the document node-by node in document order and copies the matched node "as-is"
We override the identity rule with a template that matches any xref element that is the first in a group of xref elements, each of which (but the last one) is followed by an immediate text-node-sibling that starts with the ',' character. Here we find the first text-node-sibling that breaks the rule (its starting character isn't ','.
Then we find all the nodes in the group, using the Kayessian (after #Michael Kay) formula for the intersection of two nodesets. This formula is: $ns1[count(.|$ns2) = count($ns2)]
Then we process all nodes in the group in a mode named "group".
Finally, we apply templates (in anonymous mode) to the breaking text node (that is the first node following the group), so that the chain of processing continues.
Interesting question. +1.
Here's an XSLT 2.0 solution:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="2.0">
<xsl:variable name="comma-regex">^\s*,\s*$</xsl:variable>
<!-- Identity transform -->
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<!-- Don't directly process xrefs that are second or later in a comma-separated series.
Note that this template has a higher default priority than the following one,
because of the predicate. -->
<xsl:template match="xref[preceding-sibling::node()[1]/
self::text()[matches(., $comma-regex)]/
preceding-sibling::*[1]/self::xref]" />
<!-- Don't directly process comma text nodes that are in the middle of a series. -->
<xsl:template match="text()[matches(., $comma-regex) and
preceding-sibling::*[1]/self::xref and following-sibling::*[1]/self::xref]" />
<!-- for xrefs that first (or solitary) in a comma-separated series: -->
<xsl:template match="xref">
<sup>
<xsl:call-template name="process-xref-series">
<xsl:with-param name="next" select="." />
</xsl:call-template>
</sup>
</xsl:template>
<xsl:template name="process-xref-series">
<xsl:param name="next"/>
<xsl:if test="$next">
<xsl:value-of select="$next"/>
<xsl:variable name="followingXref"
select="$next/following-sibling::node()[1]/
self::text()[matches(., $comma-regex)]/
following-sibling::*[1]/self::xref"/>
<xsl:if test="$followingXref">
<xsl:text>,</xsl:text>
<xsl:call-template name="process-xref-series">
<xsl:with-param name="next" select="$followingXref"/>
</xsl:call-template>
</xsl:if>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
(This could be simplified if we could make some assumptions about the input.)
Run against the sample input you gave, the result is:
<p>Some text here
<sup>1,2,3</sup>.
</p>
The second alternative can be achieved with
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p/text()[normalize-space() = ',' and preceding-sibling::node()[1][self::xref]]">
<sup>,</sup>
</xsl:template>
<xsl:template match="xref">
<sup>
<xsl:apply-templates/>
</sup>
</xsl:template>
</xsl:stylesheet>
There's an almost trivial solution to your 'messy alternative':
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="xref">
<sup>
<xsl:apply-templates />
</sup>
</xsl:template>
<xsl:template match="text()[normalize-space(.)=',']">
<sup>,</sup>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*| node()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
EDIT: I just noticed it's almost a clone of Martin's solution, except without the additional check of a preceding xref element on the commas. His is probably safer :)
And a slightly less trivial solution to your preferred result, although this only works if you only have one collection of xref tags in any p tag. You didn't mention the possibility of more than one collection, and even if there are, I would have thought it unlikely they'd be within the same containing p tag. If that can happen though, it's possible to extend it further to allow for that, although it will get a lot more complicated.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="xref[not(preceding-sibling::text()[normalize-space(.)=','])]">
<sup>
<xsl:value-of select="." />
<xsl:for-each select="following-sibling::text() | following-sibling::xref">
<xsl:if test="following-sibling::text()[substring(.,1,1)='.']">
<xsl:value-of select="normalize-space(.)" />
</xsl:if>
</xsl:for-each>
</sup>
</xsl:template>
<xsl:template match="xref | text()[normalize-space(.)=',']" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*| node()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
In case you can use XSLT 2.0 (e.g. with Saxon 9 or AltovaXML or XQSharp) then here is an XSLT 2.0 solution that should produce the first output you asked for:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p">
<xsl:for-each-group select="node()" group-adjacent="self::xref or self::text()[normalize-space() = ',']">
<xsl:choose>
<xsl:when test="current-grouping-key()">
<sup>
<xsl:value-of select="current-group()/normalize-space()" separator=""/>
</sup>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>

How to use XSLT to tag specific nodes with unique, sequential, increasing integer ids?

I'm trying to use XSLT to transform a document by tagging a group of XML nodes with integer ids, starting at 0, and increasing by one for each node in the group. The XML passed into the stylesheet should be echoed out, but augmented to include this extra information.
Just to be clear about what I am talking about, here is how this transformation would be expressed using DOM:
states = document.getElementsByTagName("state");
for( i = 0; i < states.length; i++){
states.stateNum = i;
}
This is very simple with DOM, but I'm having much more trouble doing this with XSLT. The current strategy I've devised has been to start with the identity transformation, then create a global variable which selects and stores all of the nodes that I wish to number. I then create a template that matches that kind of node. The idea, then, is that in the template, I would look up the matched node's position in the global variable nodelist, which would give me a unique number that I could then set as an attribute.
The problem with this approach is that the position function can only be used with the context node, so something like the following is illegal:
<template match="state">
<variable name="stateId" select="#id"/>
<variable name="uniqueStateNum" select="$globalVariable[#id = $stateId]/position()"/>
</template>
The same is true for the following:
<template match="state">
<variable name="stateId" select="#id"
<variable name="stateNum" select="position($globalVariable[#id = $stateId])/"/>
</template>
In order to use position() to look up the position of an element in $globalVariable, the context node must be changed.
I have found a solution, but it is highly suboptimal. Basically, in the template, I use for-each to iterate through the global variable. For-each changes the context node, so this allows me to use position() in the way I described. The problem is that this turns what would normally be an O(n) operation into an O(n^2) operation, where n is the length of the nodelist, as this require iterating through the whole list whenever the template is matched. I think that there must be a more elegant solution.
Altogether, here is my current (slightly simplified) xslt stylesheet:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:s="http://www.w3.org/2005/07/scxml"
xmlns="http://www.w3.org/2005/07/scxml"
xmlns:c="http://msdl.cs.mcgill.ca/"
version="1.0">
<xsl:output method="xml"/>
<!-- we copy them, so that we can use their positions as identifiers -->
<xsl:variable name="states" select="//s:state" />
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="s:state">
<xsl:variable name="stateId">
<xsl:value-of select="#id"/>
</xsl:variable>
<xsl:copy>
<xsl:apply-templates select="#*"/>
<xsl:for-each select="$states">
<xsl:if test="#id = $stateId">
<xsl:attribute name="stateNum" namespace="http://msdl.cs.mcgill.ca/">
<xsl:value-of select="position()"/>
</xsl:attribute>
</xsl:if>
</xsl:for-each>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
I'd appreciate any advice anyone can offer. Thanks.
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:s="http://www.w3.org/2005/07/scxml"
>
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*" name="identity">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="s:state">
<xsl:variable name="vNum">
<xsl:number level="any" count="s:state"/>
</xsl:variable>
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:attribute name="stateId">
<xsl:value-of select="#id"/>
</xsl:attribute>
<xsl:attribute name="id">
<xsl:value-of select="$vNum -1"/>
</xsl:attribute>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
when applied on the provided XML document:
<scxml xmlns="http://www.w3.org/2005/07/scxml">
<state id="Compound1">
<state id="Basic1"/>
<state id="Basic2"/>
<state id="Basic3"/>
</state>
</scxml>
produces the wanted, correct output:
<scxml xmlns="http://www.w3.org/2005/07/scxml">
<state stateId="Compound1" id="0">
<state stateId="Basic1" id="1"/>
<state stateId="Basic2" id="2"/>
<state stateId="Basic3" id="3"/>
</state>
</scxml>
Simplest approach:
<xsl:template match="s:state">
<xsl:copy>
<xsl:apply-templates select="#*"/>
<xsl:attribute name="stateNum" namespace="http://msdl.cs.mcgill.ca/">
<xsl:value-of select="count(preceding::s:state)" />
</xsl:attribute>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>
Not sure how your XSLT processor handles the preceding axis, so this is something to benchmark in any case.

Using XSLT, how do I separate nodes based on their value?

I have a pretty flat XML structure that I need to reorder into categorised sections and, for the life of me, I can't figure out how to do it in XSLT (not that I'm by any means an expert.)
Basically, the original XML looks kinda like:
<things>
<thing>
<value>one</value>
<type>a</type>
</thing>
<thing>
<value>two</value>
<type>b</type>
</thing>
<thing>
<value>thee</value>
<type>b</type>
</thing>
<thing>
<value>four</value>
<type>a</type>
</thing>
<thing>
<value>five</value>
<type>d</type>
</thing>
</things>
And I need to output something like:
<data>
<a-things>
<a>one</a>
<a>four</a>
</a-things>
<b-things>
<b>two</b>
<b>three</b>
</b-things>
<d-things>
<d>five</d>
</d-things>
</data>
Note that I can't output <c-things> if there aren't any <c> elements, but I do know ahead of time what the complete list of types is, and it's fairly short so handcoding templates for each type is definitely possible. It feels like I could probably hack something together using <xsl:if> and <xsl:for-each> but it also feels like there must be a more ... 'templatey' way to do it. Can anyone help?
Cheers.
As you are using Saxon, use the native XSLT 2.0 grouping.
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" />
<xsl:template match="things">
<data>
<xsl:for-each-group select="thing" group-by="type">
<xsl:element name="{concat(current-grouping-key(),'-things')}">
<xsl:for-each select="current-group()">
<xsl:element name="{current-grouping-key()}">
<xsl:value-of select="value" />
</xsl:element>
</xsl:for-each>
</xsl:element>
</xsl:for-each-group>
</data>
</xsl:template>
</xsl:stylesheet>
In XSLT 1.0 you can group with keys. This approach is called Muenchian Grouping.
The xsl:key defines an index containing thing elements, grouped by the string value of their type element. Function key() returns all nodes from the key with the specified value.
The outer xsl:for-each selects the thing elements that are the first returned by key() for their value.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" />
<xsl:key name="thing" match="thing" use="type" />
<xsl:template match="things">
<data>
<xsl:for-each select="thing[generate-id(.)=generate-id(key('thing',type)[1])]">
<xsl:element name="{concat(type,'-things')}">
<xsl:for-each select="key('thing',type)">
<xsl:element name="{type}">
<xsl:value-of select="value" />
</xsl:element>
</xsl:for-each>
</xsl:element>
</xsl:for-each>
</data>
</xsl:template>
</xsl:stylesheet>
The generic solution is to use an XSL key:
<xsl:key name="kThingByType" match="thing" use="type" />
<xsl:template match="things">
<xsl:copy>
<xsl:apply-templates select="thing" mode="group">
<xsl:sort select="type" />
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
<xsl:template match="thing" mode="group">
<xsl:variable name="wholeGroup" select="key('kThingByType', type)" />
<xsl:if test="generate-id() = generate-id($wholeGroup[1])">
<xsl:element name="{type}-thing">
<xsl:copy-of select="$wholeGroup/value" />
</xsl:element>
</xsl:if>
</xsl:template>
The above yields:
<things>
<a-thing>
<value>one</value>
<value>four</value>
</a-thing>
<b-thing>
<value>two</value>
<value>thee</value>
</b-thing>
<d-thing>
<value>five</value>
</d-thing>
</things>
In XSLT 2, you can do this very elegantly. Say you have a template for formatting each thing before it is wrapped in an <a> element:
<xsl:template match="thing" mode="format-thing">
<xsl:value-of select="value/text()"/>
</xsl:template>
Then you can apply that to each thing of some $type to build the <a-things> elements via a function:
<xsl:function name="my:things-group" as="element()">
<xsl:param name="type" as="xs:string"/>
<xsl:param name="things" as="element(thing)*"/>
<xsl:element name="{ concat($type, '-things') }">
<xsl:for-each select="$things[type/text() eq $type]">
<xsl:element name="{ $type }">
<xsl:apply-templates select="." mode="format-thing"/>
</xsl:element>
</xsl:for-each>
</xsl:element>
</xsl:function>
Then you can call that function for each unique type (a, b, d in your sample input) to build the entire output and you're done:
<xsl:template match="/">
<data>
<xsl:sequence select="
for $type in distinct-values(things/thing/type/text())
return my:things-group($type, /things/thing)
"/>
</data>
</xsl:template>
Of course, asking the question made it obvious...
My solution does use an <xsl:if>, but I can't see how it couldn't now I think about it. My solution looks basically like:
<xsl:if test="/things/thing/type = 'a'">
<a-things>
<xsl:apply-templates select="/things/thing[type='a']" mode="a" />
</a-things>
</if>
<xsl:template match="/things/thing[type='a']" mode="a">
<a><xsl:value-of select="value"/>
</xsl:template>
And repeat for the other types. I've coded it up, and it seems to work just fine.
<a-things>
<xsl:for-each select="thing[type = 'a']">
<a><xsl:value-of select="./value" /></a>
</xsl:for-each>
</a-things>
If you want to get really snazzy, replace the <a-things> and the predicate with parameters and use attribute value templates:
<xsl:param name="type" />
<xsl:element name="{$type}-things">
<xsl:for-each select="thing[type = $type]">
<xsl:element name="{$type}"><xsl:value-of select="./value" /></xsl:element>
</xsl:for-each>
</xsl:element>
And using grouping, you can do it without the if:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="things">
<data>
<xsl:for-each select="thing[not(type=preceding-sibling::thing/type)]">
<xsl:variable name="type"><xsl:value-of select="type" /></xsl:variable>
<xsl:element name="concat($type, '-things')">
<xsl:for-each select="../thing[type=$type]">
<xsl:element name="$type">
<xsl:value-of select="value" />
</xsl:element>
</xsl:for-each>
</xsl:element>
</xsl:for-each>
</data>
</xsl:template>
</xsl:stylesheet>