efficient xslt conditional increment - xslt

In this question i asked how to perform a conditional increment. The provided answer worked, but does not scale well on huge data-sets.
The Input:
<Users>
<User>
<id>1</id>
<username>jack</username>
</User>
<User>
<id>2</id>
<username>bob</username>
</User>
<User>
<id>3</id>
<username>bob</username>
</User>
<User>
<id>4</id>
<username>jack</username>
</User>
</Users>
The desired output (in optimal time-complexity):
<Users>
<User>
<id>1</id>
<username>jack01</username>
</User>
<User>
<id>2</id>
<username>bob01</username>
</User>
<User>
<id>3</id>
<username>bob02</username>
</User>
<User>
<id>4</id>
<username>jack02</username>
</User>
</Users>
For this purpose it would be nice to
sort input by username
for each user
when previous username is equals current username
increment counter and
set username to '$username$counter'
otherwise
set counter to 1
(sort by id again - no requirement)
Any thoughts?

This is kind of ugly and I'm not fond of using xsl:for-each, but it should be faster than using preceding-siblings, and doesn't need a 2-pass approach:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:key name="count" match="User" use="username" />
<xsl:template match="Users">
<Users>
<xsl:for-each select="User[generate-id()=generate-id(key('count',username)[1])]">
<xsl:for-each select="key('count',username)">
<User>
<xsl:copy-of select="id" />
<username>
<xsl:value-of select="username" />
<xsl:number value="position()" format="01"/>
</username>
</User>
</xsl:for-each>
</xsl:for-each>
</Users>
</xsl:template>
</xsl:stylesheet>
If you really need it sorted by ID afterwards, you can wrap it into a two-pass template:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt">
<xsl:key name="count" match="User" use="username" />
<xsl:template match="Users">
<xsl:variable name="pass1">
<xsl:for-each select="User[generate-id()=generate-id(key('count',username)[1])]">
<xsl:for-each select="key('count',username)">
<User>
<xsl:copy-of select="id" />
<username>
<xsl:value-of select="username" />
<xsl:number value="position()" format="01"/>
</username>
</User>
</xsl:for-each>
</xsl:for-each>
</xsl:variable>
<xsl:variable name="pass1Nodes" select="msxsl:node-set($pass1)" />
<Users>
<xsl:for-each select="$pass1Nodes/*">
<xsl:sort select="id" />
<xsl:copy-of select="." />
</xsl:for-each>
</Users>
</xsl:template>
</xsl:stylesheet>

This transformation produces exactly the specified wanted result and is efficient (O(N)):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ext="http://exslt.org/common" exclude-result-prefixes="ext">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="kUserByName" match="User" use="username"/>
<xsl:key name="kUByGid" match="u" use="#gid"/>
<xsl:variable name="vOrderedByName">
<xsl:for-each select=
"/*/User[generate-id()=generate-id(key('kUserByName',username)[1])]">
<xsl:for-each select="key('kUserByName',username)">
<u gid="{generate-id()}" pos="{position()}"/>
</xsl:for-each>
</xsl:for-each>
</xsl:variable>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="username/text()">
<xsl:value-of select="."/>
<xsl:variable name="vGid" select="generate-id(../..)"/>
<xsl:for-each select="ext:node-set($vOrderedByName)[1]">
<xsl:value-of select="format-number(key('kUByGid', $vGid)/#pos, '00')"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
When applied on the provided XML document:
<Users>
<User>
<id>1</id>
<username>jack</username>
</User>
<User>
<id>2</id>
<username>bob</username>
</User>
<User>
<id>3</id>
<username>bob</username>
</User>
<User>
<id>4</id>
<username>jack</username>
</User>
</Users>
the wanted, correct result is produced:
<Users>
<User>
<id>1</id>
<username>jack01</username>
</User>
<User>
<id>2</id>
<username>bob01</username>
</User>
<User>
<id>3</id>
<username>bob02</username>
</User>
<User>
<id>4</id>
<username>jack02</username>
</User>
</Users>

Here's a slight variation, but possible not a great increase in efficiency
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:output method="xml" indent="yes"/>
<xsl:key name="User" match="User" use="username" />
<xsl:template match="username/text()">
<xsl:value-of select="." />
<xsl:variable name="id" select="generate-id(..)" />
<xsl:for-each select="key('User', .)">
<xsl:if test="generate-id(username) = $id">
<xsl:number value="position()" format="01"/>
</xsl:if>
</xsl:for-each>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
What this is doing is defining a key to group Users by username. Then, for each username element, you look through the elements in the key for that username, and output the position when you find a match.
One slight advantage of this approach is that you are only looking at user records with the same name. This may be more efficient if you don't have huge numbers of the same name.

Related

XSLT 1.0 transformation

I'm trying to transform a XML file with XSLT 1.0 but I'm having troubles with this.
Input:
<task_order>
<Q>
<record id="1">
<column name="task_externalId">SPLIT4_0</column>
</record>
<record id="2">
<column name="task_externalId">SPLIT4_1</column>
</record>
</Q>
<task>
<id>SPLIT4</id>
<name>test</name>
</task>
</task_order>
Wanted result:
For each task_order element: When there is more than 1 record-element (SPLIT4 and SPLIT4_1) I need to duplicate the task element and change the orginal task-id with the id from record elements.
<task_order>
<Q>
<record id="1">
<column name="task_externalId">SPLIT4_0</column>
</record>
<record id="2">
<column name="task_externalId">SPLIT4_1</column>
</record>
</Q>
<task>
<id>SPLIT4_0</id>
<name>test</name>
</task>
<task>
<id>SPLIT4_1</id>
<name>test</name>
</task>
</task_order>
Any suggestions?
Thank you
First start off with the Identity Template which will handle copying across all existing nodes
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
Next, to make looking up the columns slightly easier, consider using an xsl:key
<xsl:key name="column" match="column" use="substring-before(., '_')" />
Then, you have a template matching task where you can look up all matching column elements using the key, and create a new task element for each
<xsl:template match="task">
<xsl:variable name="task" select="." />
<xsl:for-each select="key('column', id)">
<!-- Create new task -->
</xsl:for-each>
</xsl:template>
Try this XSTL
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes" />
<xsl:key name="column" match="column" use="substring-before(., '_')" />
<xsl:template match="task">
<xsl:variable name="task" select="." />
<xsl:for-each select="key('column', id)">
<task>
<id><xsl:value-of select="." /></id>
<xsl:apply-templates select="$task/*[not(self::id)]" />
</task>
</xsl:for-each>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

XSLT performance issue with large data

having performance issues with my xslt code:
this is my input file:
<?xml version="1.0" encoding="UTF-8"?>
<Products>
<Product ID="111111" Type="Item" ParentID="7402">
<Name>ABC</Name>
<Values>
<Value AttributeID="11">8.00</Value>
<Value AttributeID="12">8.00</Value>
<Value AttributeID="13">0.18</Value>
</Values>
<Product ID="B582B65D" Type="UID" ParentID="111111">
<Values>
<Value AttributeID="11">8.00</Value>
<Value AttributeID="12">8.00</Value>
<Value AttributeID="13">0.18</Value>
<Value AttributeID="14">0.18</Value>
</Values>
</Product>
</Product>
<Product ID="222222" Type="Item" ParentID="7402">
<Name>XYZ</Name>
<Values>
<Value AttributeID="12">8.00</Value>
<Value AttributeID="13">8.00</Value>
<Value AttributeID="15">0.18</Value>
</Values>
<Product ID="B582B65D" Type="UID" ParentID="111111">
<Values>
<Value AttributeID="11">8.00</Value>
<Value AttributeID="12">8.00</Value>
<Value AttributeID="16">0.18</Value>
<Value AttributeID="18">0.18</Value>
</Values>
</Product>
</Product>
</Products>
and this is my transformation code:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:math="http://exslt.org/math"
extension-element-prefixes="math">
<xsl:output method="xml" indent="yes" />
<xsl:param name="file2" select="document('Mapping.xml')" />
<xsl:template match="/Products">
<Products>
<xsl:for-each select="Product">
<xsl:call-template name="item" />
</xsl:for-each>
</Products>
</xsl:template>
<xsl:template name="item">
<Product type="{./#Type}" ID="{./#ID}">
<xsl:for-each select="./Values/Value">
<xsl:variable name="Idval" select="#AttributeID" />
<xsl:element name="{$file2//Groups/AttributeID[#ID=$Idval]/#group}">
<xsl:element name="{$file2//Groups/AttributeID[#ID=$Idval]}">
<xsl:attribute name="ID"><xsl:value-of select="$Idval"/></xsl:attribute>
<xsl:value-of select="." />
</xsl:element>
</xsl:element>
</xsl:for-each>
<xsl:call-template name="uid" />
</Product>
</xsl:template>
<xsl:template name="uid">
<Product type="{./Product/#Type}" ParentId="{./Product/#ParentID}">
<xsl:for-each select="./Product/Values/Value">
<xsl:variable name="Idval" select="#AttributeID" />
<xsl:element name="{$file2//Groups/AttributeID[#ID=$Idval]/#group}">
<xsl:element name="{$file2//Groups/AttributeID[#ID=$Idval]}">
<xsl:attribute name="ID"><xsl:value-of select="$Idval"/></xsl:attribute>
<xsl:value-of select="." />
</xsl:element>
</xsl:element>
</xsl:for-each>
</Product>
</xsl:template>
</xsl:stylesheet>
above xslt is using below xml file for mapping attribute id to corresponding name and group
Mapping.xml
<?xml version="1.0" encoding="UTF-8"?>
<Groups>
<AttributeID ID="11" group="Pack1">Height</AttributeID>
<AttributeID ID="12" group="Pack2">Width</AttributeID>
<AttributeID ID="13" group="Pack1">Depth</AttributeID>
<AttributeID ID="14" group="Pack3">Length</AttributeID>
<AttributeID ID="15" group="Pack3">Lbs</AttributeID>
<AttributeID ID="16" group="Pack4">Litre</AttributeID>
</Groups>
Replace the use of expressions like
select="$file2//Groups/AttributeID[#ID=$Idval]"
with a key:
<xsl:key name="ID" match="Groups/AttributeID" use="#ID"/>
and then
select="key('ID', $IDval, $file)"/>
Alternatively, Saxon-EE will do this optimization for you automatically.
The key() function with 3 arguments is XSLT 2.0 syntax. If you have the misfortune to be using XSLT 1.0, you have to write a dummy xsl:for-each that makes $file the context item, because key() will only select within the document containing the context item.
Define a key for the cross document lookup: <xsl:key name="by-id" match="Groups/AttributeID" use="#ID"/>, then (assuming an XSLT 2.0 processor) you can simplify expressions like <xsl:element name="{$file2//Groups/AttributeID[#ID=$Idval]/#group}"> to <xsl:element name="{key('by-id', #AttributeID, $file2)/#group">. Make the same change for the other cross references you have, i.e. all those $file2//Groups/AttributeID[#ID=$Idval] expressions should use the key lookup.
Making the simplified assumption that your second file isn't too big, you want to fold the values there into your template. It would work with XSLT 1.0 too. Something like this:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:math="http://exslt.org/math"
extension-element-prefixes="math">
<xsl:output method="xml" indent="yes" />
<xsl:template match="/">
<Products>
<xsl:apply-templates select="/Products/Product" />
</Products>
</xsl:template>
<xsl:template match="Product">
<xsl:element name="Product">
<xsl:apply-templates select="#*" />
<xsl:apply-templates />
</xsl:element>
</xsl:template>
<xsl:template match="Name">
<Name>
<xsl:value-of select="." />
</Name>
</xsl:template>
<xsl:template match="#*">
<xsl:attribute name="{name()}">
<xsl:value-of select="." />
</xsl:attribute>
</xsl:template>
<xsl:template match="Values">
<Values>
<xsl:apply-templates />
</Values>
</xsl:template>
<!-- Templates for individual AttributeIDs, only when there are few -->
<xsl:template match="Value[#AttributeID='11']">
<Pack1>
<xsl:element name="Height">
<xsl:attribute name="ID">
<xsl:value-of select="#AttributeID" />
</xsl:attribute>
<xsl:value-of select="." />
</xsl:element>
</Pack1>
</xsl:template>
<!-- Repeat for the other AttributeID values -->
</xsl:stylesheet>
(Typed off my head, will contain typos)
Of course if it is big Michael's advice is the best course of action.

Grouping of grouped data

Input:
<persons>
<person name="John" role="Writer"/>
<person name="John" role="Poet"/>
<person name="Jacob" role="Writer"/>
<person name="Jacob" role="Poet"/>
<person name="Joe" role="Poet"/>
</persons>
Expected Output:
<groups>
<group roles="Wriet, Poet" persons="John, Jacob"/>
<group roles="Poet" persons="Joe"/>
</groups>
As in the above example, I first need to group on person names and find everyone's roles. If more than one person is found to have the same set of roles (e.g. both John and Jacob are both Writer and Poet), then I need to group on each set of roles and list the person names.
I can do this for the first level of grouping using Muenchian method or EXSLT set:distinct etc.
<groups>
<group roles="Wriet, Poet" persons="John"/>
<group roles="Wriet, Poet" persons="Jacob"/>
<group roles="Poet" persons="Joe"/>
</groups>
The above was transformed using XSLT 1.0 and EXSLT:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:sets="http://exslt.org/sets" extension-element-prefixes="sets">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
<xsl:key name="persons-by-name" match="person" use="#name"/>
<xsl:template match="persons">
<groups>
<xsl:for-each select="sets:distinct(person/#name)">
<group>
<xsl:attribute name="persons"><xsl:value-of select="."/></xsl:attribute>
<xsl:attribute name="roles">
<xsl:for-each select="key('persons-by-name', .)">
<xsl:value-of select="#role"/>
<xsl:if test="position()!=last()"><xsl:text>, </xsl:text></xsl:if>
</xsl:for-each>
</xsl:attribute>
</group>
</xsl:for-each>
</groups>
</xsl:template>
</xsl:stylesheet>
However, I need help to understand how to group on the grouped roles.
If XSLT 1.0 solution is not available, please feel free to recommend XSLT 2.0 approach.
Try it this way?
XSLT 1.0
(using EXSLT node-set() and distinct() functions)
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
xmlns:set="http://exslt.org/sets"
extension-element-prefixes="exsl set">
<xsl:output method="xml" encoding="UTF-8" indent="yes" />
<xsl:key name="person-by-name" match="person" use="#name" />
<xsl:key name="person-by-roles" match="person" use="#roles" />
<xsl:variable name="distinct-persons">
<xsl:for-each select="set:distinct(/persons/person/#name)">
<person name="{.}">
<xsl:attribute name="roles">
<xsl:for-each select="key('person-by-name', .)/#role">
<xsl:sort/>
<xsl:value-of select="." />
<xsl:if test="position()!=last()">
<xsl:text>, </xsl:text>
</xsl:if>
</xsl:for-each>
</xsl:attribute>
</person>
</xsl:for-each>
</xsl:variable>
<xsl:template match="/">
<groups>
<xsl:for-each select="set:distinct(exsl:node-set($distinct-persons)/person/#roles)">
<group roles="{.}">
<xsl:attribute name="names">
<xsl:for-each select="key('person-by-roles', .)/#name">
<xsl:value-of select="." />
<xsl:if test="position()!=last()">
<xsl:text>, </xsl:text>
</xsl:if>
</xsl:for-each>
</xsl:attribute>
</group>
</xsl:for-each>
</groups>
</xsl:template>
</xsl:stylesheet>
Result:
<?xml version="1.0" encoding="UTF-8"?>
<groups>
<group roles="Poet, Writer" names="John, Jacob"/>
<group roles="Poet" names="Joe"/>
</groups>
I did exactly the same thing as you already did and then went a step further and grouped again. Now I get the following output with your input:
<?xml version="1.0" encoding="UTF-8"?>
<groups>
<group roles="Writer,Poet" persons="John,Jacob"/>
<group roles="Poet" persons="Joe"/>
</groups>
This is the XSLT 2.0
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns="" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:avintis="http://www.avintis.com/esb" exclude-result-prefixes="#all" version="2.0">
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>
<xsl:template match="/persons">
<groups>
<xsl:variable name="persons" select="."/>
<!-- create a temporary variable containing all roles of a person -->
<xsl:variable name="roles">
<xsl:for-each select="distinct-values(person/#name)">
<xsl:sort select="."/>
<xsl:variable name="name" select="."/>
<xsl:element name="group">
<xsl:attribute name="roles">
<!-- sort the roles of each person -->
<xsl:variable name="rolesSorted">
<xsl:for-each select="$persons/person[#name=$name]">
<xsl:sort select="#role"/>
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:variable>
<xsl:value-of select="string-join($rolesSorted/person/#role,',')"/>
</xsl:attribute>
<xsl:attribute name="persons" select="."/>
</xsl:element>
</xsl:for-each>
</xsl:variable>
<!-- now loop again over all roles of the persons and group persons having the same roles -->
<xsl:for-each select="distinct-values($roles/group/#roles)">
<xsl:element name="group">
<xsl:variable name="name" select="."/>
<xsl:attribute name="roles" select="$name"/>
<xsl:attribute name="persons">
<xsl:value-of select="string-join($roles/group[#roles=$name]/#persons,',')"/>
</xsl:attribute>
</xsl:element>
</xsl:for-each>
</groups>
</xsl:template>
<xsl:template match="*|text()|#*">
<xsl:copy>
<xsl:apply-templates select="*|text()|#*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
The roles get also sorted - so independent from the input order of the roles and persons.

XSLT key only returns a value once

I think I'm missing something obvious here but here goes. I have the below xml and I need to group the KEY nodes of the matched instances together. This is specified by the match attribute and it can contain more than one item number. There can be any number of ITEM nodes and any number of KEY nodes. Also, there is no limit to the depth of the ITEM nodes. And, the matched instances need not be under the same parent. I'm also limited to XSLT 1.0 and the Microsoft parser.
<?xml version="1.0" encoding="utf-8" ?>
<ITEM number='1'>
<ITEM number='2'>
<ITEM number='3' match='5,11'>
<KEY name='key1' value='x' />
<KEY name='key2' value='y' />
<KEY name='key3' value='z' />
<ITEM number ='4' />
</ITEM>
<ITEM number='5' match='3,11'>
<KEY name='key1' value='x' />
<KEY name='key2' value='y' />
<KEY name='key3' value='z' />
</ITEM>
<ITEM number='6' match='10'>
<KEY name='key1' value='x' />
<KEY name='key2' value='y' />
<KEY name='key4' value='a' />
</ITEM>
<ITEM number='7' />
<ITEM number='8'>
<KEY name='key1' value='x' />
</ITEM>
</ITEM>
<ITEM number='9'>
<ITEM number='10' match='6'>
<KEY name='key1' value='x' />
<KEY name='key3' value='z' />
<KEY name='key5' value='b' />
</ITEM>
</ITEM>
<ITEM number='11' match='3,5'>
<KEY name='key2' value='y' />
<KEY name='key3' value='z' />
</ITEM>
</ITEM>
My expected result would look something like this...
<?xml version="1.0" encoding="utf-8" ?>
<Result>
<Group number="1" />
<Group number="2" />
<Group number="3,5,11">
<KEY name='key1' value='x' />
<KEY name='key2' value='y' />
<KEY name='key3' value='z' />
</Group>
<Group number="4" />
<Group number="6,10">
<KEY name='key1' value='x' />
<KEY name='key2' value='y' />
<KEY name='key3' value='z' />
<KEY name='key4' value='a' />
<KEY name='key5' value='b' />
</Group>
<Group number="7" />
<Group number="8">
<KEY name='key1' value='x' />
</Group>
<Group number="9" />
</Result>
What I actually get is...
<?xml version="1.0" encoding="utf-8"?>
<Result>
<Group number="1" />
<Group number="2" />
<Group number="3,5,11">
<KEY name="key1" value="x" />
<KEY name="key2" value="y" />
<KEY name="key3" value="z" />
</Group>
<Group number="4" />
<Group number="6,10">
<KEY name="key4" value="a" />
<KEY name="key5" value="b" />
</Group>
<Group number="7" />
<Group number="8" />
<Group number="9" />
</Result>
I'm using a key and it looks like once I access that particular value from the key function, I cannot access it again. Group number 6,10 should contain all 5 keys but is missing the first 3 which are already present in group number 3,5. Similarly for group number 8, it should contain 1 key. I've used recursion to skip over the matched instances but I don't think there is any issue over there, it seems to be related to the key functionality. I've attached my xslt below, please take a look and tell me what I'm doing wrong. Any tips for performance improvements are also appreciated :)
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
<xsl:output method="xml" indent="yes"/>
<xsl:key name="kKeyByName" match="KEY" use="#name" />
<xsl:template name="ProcessItem">
<!--pItemsList - node set containing items that need to be processed-->
<xsl:param name="pItemsList" />
<!--pProcessedList - string containing processed item numbers in the format |1|2|3|-->
<xsl:param name="pProcessedList" />
<xsl:variable name="vCurrItem" select="$pItemsList[1]" />
<!--Recursion exit condition - check if we have a valid Item-->
<xsl:if test="$vCurrItem">
<xsl:variable name="vNum" select="$vCurrItem/#number" />
<!--Skip processed instances-->
<xsl:if test="not(contains($pProcessedList, concat('|', $vNum, '|')))">
<xsl:element name="Group">
<!--If the item is matched with another item, only the distinct keys of the 2 should be displayed-->
<xsl:choose>
<xsl:when test="$vCurrItem/#match">
<xsl:attribute name="number">
<xsl:value-of select="concat($vNum, ',', $vCurrItem/#match)" />
</xsl:attribute>
<xsl:for-each select="(//ITEM[#number=$vNum or #match=$vNum]/KEY)[generate-id(.)=generate-id(key('kKeyByName', #name)[1])]">
<xsl:apply-templates select="." />
</xsl:for-each>
</xsl:when>
<xsl:otherwise>
<xsl:attribute name="number">
<xsl:value-of select="$vNum" />
</xsl:attribute>
<xsl:apply-templates select="KEY" />
</xsl:otherwise>
</xsl:choose>
</xsl:element>
</xsl:if>
<!--Append processed instances to list to pass on in recursive function-->
<xsl:variable name="vNewList">
<xsl:value-of select="$pProcessedList" />
<xsl:value-of select="concat($vNum, '|')" />
<xsl:if test="$vCurrItem/#match">
<xsl:value-of select="concat($vCurrItem/#match, '|')" />
</xsl:if>
</xsl:variable>
<!--Call template recursively to process the rest of the instances-->
<xsl:call-template name="ProcessItem">
<xsl:with-param name="pItemsList" select="$pItemsList[position() > 1]" />
<xsl:with-param name="pProcessedList" select="$vNewList" />
</xsl:call-template>
</xsl:if>
</xsl:template>
<xsl:template match="KEY">
<xsl:copy>
<xsl:copy-of select="#*|node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<xsl:element name="Result">
<xsl:call-template name="ProcessItem">
<xsl:with-param name="pItemsList" select="//ITEM" />
<xsl:with-param name="pProcessedList" select="'|'" />
</xsl:call-template>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
IF there is only one match or none to each item you can give the following xslt a try:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:output method="xml" indent="yes" />
<xsl:strip-space elements="*"/>
<xsl:key name="kItemNr" match="ITEM" use="#number" />
<xsl:key name="kNumberKey" match="KEY" use="concat(../#number, '|', #name )" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="ITEM">
<xsl:if test="not(preceding::ITEM[#number = current()/#match])" >
<Group>
<xsl:attribute name="number">
<xsl:value-of select="#number"/>
<xsl:if test="#match" >
<xsl:text>,</xsl:text>
<xsl:value-of select="#match"/>
</xsl:if>
</xsl:attribute>
<xsl:variable name="itemNr" select="#number"/>
<xsl:apply-templates select="KEY | key('kItemNr',#match )/KEY[
not (key('kNumberKey', concat($itemNr, '|', #name) ) )] ">
<xsl:sort select="#name"/>
</xsl:apply-templates>
</Group>
</xsl:if>
</xsl:template>
<xsl:template match="/" >
<Result>
<xsl:for-each select="//ITEM[count(. | key('kItemNr',number ) ) = 1 ]" >
<xsl:apply-templates select="." />
</xsl:for-each>
</Result>
</xsl:template>
</xsl:stylesheet>
Which will generate the following output:
<?xml version="1.0"?>
<Result>
<Group number="1"/>
<Group number="2"/>
<Group number="3,5">
<KEY name="key1" value="x"/>
<KEY name="key2" value="y"/>
<KEY name="key3" value="z"/>
</Group>
<Group number="4"/>
<Group number="6,10">
<KEY name="key1" value="x"/>
<KEY name="key2" value="y"/>
<KEY name="key3" value="z"/>
<KEY name="key4" value="a"/>
<KEY name="key5" value="b"/>
</Group>
<Group number="7"/>
<Group number="8">
<KEY name="key1" value="x"/>
</Group>
<Group number="9"/>
</Result>
Update because of changed request:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:output method="xml" indent="yes"/>
<xsl:key name="kItemNr" match="ITEM" use="#number" />
<xsl:template match="#*|node()">
<xsl:copy >
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="ITEM">
<xsl:variable name="matchStr" select=" concat(',', current()/#match, ',')"/>
<xsl:if test="not(preceding::ITEM[ contains($matchStr, concat(',', #number, ',') )])" >
<Group>
<xsl:attribute name="number">
<xsl:value-of select="#number"/>
<xsl:if test="#match" >
<xsl:text>,</xsl:text>
<xsl:value-of select="#match"/>
</xsl:if>
</xsl:attribute>
<xsl:apply-templates select="(KEY |
//ITEM[
contains( $matchStr, concat(',', #number, ',') )
]/KEY[
not((preceding::ITEM[
contains( $matchStr, concat(',', #number, ',') )
] | current() )/KEY/#name = #name)
]) ">
<xsl:sort select="#name"/>
</xsl:apply-templates>
</Group>
</xsl:if>
</xsl:template>
<xsl:template match="/" >
<Result>
<xsl:for-each select="//ITEM[count(. | key('kItemNr',number ) ) = 1 ]" >
<xsl:apply-templates select="." />
</xsl:for-each>
</Result>
</xsl:template>
</xsl:stylesheet>
This may be quite slow for bigger input data but any way.

Transform to nest items within one list when source contains flat tagging

I have the following XML file:
<li id="s9781452281988.n39.i34"><i>See also</i>
<a class="term-ref" id="s9781452281988.n39.i6525" href="#s9781452281988.n39.i1899">Emotion</a>;
<a class="term-ref" id="s9781452281988.n39.i6526" href="#s9781452281988.n39.i3312">Interpersonal conflict</a></li>
And I want the output to be the following:
<item>See also
<list rend="runon">
<item><term>Emotion</term></item>
<item><term>Interpersonal conflict</term></item>
</list>
</item>
Basically if I have multiple a[#class='term-ref'], the first instance should start the list rend="runon" and subsequent a[#class='term-ref'] should be included as item/term within the list.
The below was my try, but it is not working as I had hoped, and is closing the list before the second item/term (elements which are also not being output):
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
version="2.0">
<xsl:template match="li">
<xsl:element name="item">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="a[#class='term-ref'][1]">
<xsl:element name="list">
<xsl:attribute name="rend" select="'runon'"/>
<xsl:element name="item">
<xsl:element name="term">
<xsl:apply-templates/>
</xsl:element>
</xsl:element>
<xsl:if test="a[#class='term-ref'][position() >1]">
<xsl:element name="item">
<xsl:element name="term">
<xsl:apply-templates/>
</xsl:element>
</xsl:element>
</xsl:if>
</xsl:element>
</xsl:template>
<xsl:template match="li//text()">
<xsl:value-of select="translate(., '.,;', '')"/>
</xsl:template>
</xsl:stylesheet>
On the source, XML, the above stylesheet produces this output:
<item>See also
<list rend="runon">
<item><term>Emotion</term></item>
</list>
Interpersonal conflict</item>
Which is incorrect.
What am i doing wrong?
This short transformation (almost completely "push style", with no conditional instructions, no xsl:element and no unnecessary function calls like translate() or replace()):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="li">
<item><xsl:apply-templates/></item>
</xsl:template>
<xsl:template match="a[#class='term-ref'][1]">
<list rend="runon">
<xsl:apply-templates mode="group"
select="../a[#class='term-ref']"/>
</list>
</xsl:template>
<xsl:template match="a[#class='term-ref']" mode="group">
<item><term><xsl:apply-templates/></term></item>
</xsl:template>
<xsl:template match="a[#class='term-ref']|li/text()" priority="-1"/>
</xsl:stylesheet>
when applied on the provided XML document -- which is well-formed:
<li id="s9781452281988.n39.i34"><i>See also</i>
<a class="term-ref" id="s9781452281988.n39.i6525"
href="#s9781452281988.n39.i1899">Emotion</a>;
<a class="term-ref" id="s9781452281988.n39.i6526"
href="#s9781452281988.n39.i3312">Interpersonal conflict.</a>.
</li>
produces the wanted, correct result:
<item>See also<list rend="runon">
<item>
<term>Emotion</term>
</item>
<item>
<term>Interpersonal conflict.</term>
</item>
</list>
</item>
This should work...
XML Input (well-formed)
<doc>
<li id="s9781452281988.n39.i34"><i>See also</i>
<a class="term-ref" id="s9781452281988.n39.i6525" href="#s9781452281988.n39.i1899">Emotion</a>;
<a class="term-ref" id="s9781452281988.n39.i6526" href="#s9781452281988.n39.i3312">Interpersonal conflict.</a>.
</li>
</doc>
XSLT 2.0
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="li">
<item>
<xsl:apply-templates select="i/text()"/>
<xsl:if test="a">
<list rend="runon">
<xsl:apply-templates select="a"/>
</list>
</xsl:if>
</item>
</xsl:template>
<xsl:template match="a">
<item><term><xsl:apply-templates select="node()"/></term></item>
</xsl:template>
<xsl:template match="li//text()">
<xsl:value-of select="replace(.,'[.,;]','')"/>
</xsl:template>
</xsl:stylesheet>
Output
<doc>
<item>See also<list rend="runon">
<item>
<term>Emotion</term>
</item>
<item>
<term>Interpersonal conflict</term>
</item>
</list>
</item>
</doc>
This should do what you are looking to do:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="2.0">
<xsl:template match="li">
<xsl:element name="item">
<xsl:apply-templates select="node()" />
<xsl:apply-templates select="." mode="items" />
</xsl:element>
</xsl:template>
<xsl:template match="li//text()">
<xsl:value-of select="normalize-space(translate(., '.,;', ''))"/>
</xsl:template>
<xsl:template match="a[#class = 'term-ref']" />
<xsl:template match="node()" mode="items" />
<xsl:template match="li" mode="items">
<xsl:apply-templates mode="items" />
</xsl:template>
<xsl:template match="li[count(a[#class = 'term-ref']) > 1]" mode="items">
<list rend="runon">
<xsl:apply-templates select="a[#class = 'term-ref']" mode="items" />
</list>
</xsl:template>
<xsl:template match="a[#class = 'term-ref']" mode="items">
<item>
<term>
<xsl:value-of select="."/>
</term>
</item>
</xsl:template>
</xsl:stylesheet>
When run on your sample input, this produces:
<item>
See also<list rend="runon">
<item>
<term>Emotion</term>
</item>
<item>
<term>Interpersonal conflict</term>
</item>
</list>
</item>
When run on an input file with just one a.term-ref, this produces:
<item>
See also<item>
<term>Interpersonal conflict</term>
</item>
</item>