Replace parts of attribute values with XSLT - xslt

i want to replace the substring of an attribute value with another value. in the sample below, i would like to take all elements with the attribute tagName = "blubb", and replace their tagValue, by finding the string "abc" in it, and replacing it with xyz. Also, the string "def" in the same attribute (if existign) should be replaced with AAA.
Sample Input:
<shop>
<items>
<item id="1">
<tag tagName = "Description" tagValue ="Item 1" />
<tag tagName = "Price" tagValue = "5.00" />
<tag tagName = "Currency" tagValue = "USD" />
<tag tagName = "blubb" tagValue = "abc,def,ghi,jkl" />
</item>
<item id="2">
<tag tagName = "Description" tagValue ="Item 2" />
<tag tagName = "Price" tagValue = "5.00" />
<tag tagName = "Currency" tagValue = "EUR" />
<tag tagName = "blubb" tagValue = "def,ghi,jkl" />
</item>
<item id="2">
<tag tagName = "Description" tagValue ="Item 2" />
<tag tagName = "Price" tagValue = "5.00" />
<tag tagName = "Currency" tagValue = "EUR" />
<tag tagName = "blubb" tagValue = "abc,def,jkl" />
</item>
</items>
</shop>
Expected Output (abc replaced with xyz and def replaced with AAA)
<shop>
<items>
<item id="1">
<tag tagName = "Description" tagValue ="Item 1" />
<tag tagName = "Price" tagValue = "5.00" />
<tag tagName = "Currency" tagValue = "USD" />
<tag tagName = "blubb" tagValue = "xyz,AAA,ghi,jkl" />
</item>
<item id="2">
<tag tagName = "Description" tagValue ="Item 2" />
<tag tagName = "Price" tagValue = "5.00" />
<tag tagName = "Currency" tagValue = "EUR" />
<tag tagName = "blubb" tagValue = "AAA,ghi,jkl" />
</item>
<item id="2">
<tag tagName = "Description" tagValue ="Item 2" />
<tag tagName = "Price" tagValue = "5.00" />
<tag tagName = "Currency" tagValue = "EUR" />
<tag tagName = "blubb" tagValue = "xyz,AAA,jkl" />
</item>
</items>
</shop>
Is that possible with xslt?
Thanks!
UPDATE - i tried adapting my xsl with the replace function- i had a copy-of before and adapted it to a copy as in the sample below, however i dont get any data now anymore, so i tried making it work with copy-of again
When i do that, it doesn't replace anything. I suppose that is because i have twiche xsl:template in it, is it?
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="no" indent="yes"/>
<xsl:template match="items">
<xsl:copy-of select="item[#type='DEVICE']/tag[#tagName='Currency' and starts-with(#tagValue,'EUR')]/.."/>
</xsl:template>
<xsl:template match="item/tag[#tagName='blubb']">
<xsl:param name="tagValue" />
<xsl:variable name="tagValue" select="replace($tagValue,'abc','xyz')"/>
<xsl:variable name="tagValue" select="replace($tagValue,'def','AAA')"/>
</xsl:template>
</xsl:stylesheet>

You Can try This -
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="2.0">
<xsl:output method="xml" omit-xml-declaration="no"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="tag[#tagName[.='blubb']]/#tagValue">
<xsl:attribute name="tagValue">
<xsl:analyze-string select="." regex="(abc)|(def)">
<xsl:matching-substring>
<xsl:choose>
<xsl:when test="regex-group(1)">
<xsl:text>xyz</xsl:text>
</xsl:when>
<xsl:when test="regex-group(2)">
<xsl:text>AAA</xsl:text>
</xsl:when>
</xsl:choose>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
Use this analyze-string

You can refer to this article: XSLT string replace
To resume:
For XSLT 2.0:
By using the replace function <xsl:variable name="text" select="replace($text,'word_to_be_replaced','word_to_replace')"/>
For XSLT 1.0:
Test if the String to replace exists (<xsl:when test="contains($text, $word_to_be_replaced)">)
If so, output the substring before (<xsl:value-of select="substring-before($text,$word_to_be_replaced)" />), the word to replace, the substring after ((<xsl:value-of select="substring-after($text,$word_to_be_replaced)" />)).

Beware of using a simple:
replace($value,'abc','XYZ')
You might get false positives when a token only contains the substring abc - e.g. abcdef or deabcef or defabc.
To make sure you only replace a whole token abc, use:
replace($value, '(^|,)abc(,|$)', '$1XYZ$2')
Demo: https://xsltfiddle.liberty-development.net/ej9EGdo

Related

position() != last() not working

I have an XML like:
<ast>
<group>
<Set>
<location line="1" column="22" offset="22"/>
<group>
<Id value="foo">
<location line="1" column="31" offset="31"/>
</Id>
</group>
<group>
<Function>
<location line="1" column="22" offset="22"/>
<end-location line="1" column="49" offset="49"/>
<group>
<Id value="a">
<location line="1" column="35" offset="35"/>
</Id>
<Id value="b">
<location line="1" column="37" offset="37"/>
</Id>
</group>
<group>
<Return>
<location line="1" column="40" offset="40"/>
<Number value="0">
<location line="1" column="47" offset="47"/>
</Number>
</Return>
</group>
</Function>
</group>
</Set>
...
</group>
</ast>
which I process with this template:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="text()" /><!-- remove blanks -->
<xsl:template match="Set[group[position()=1]/Id][group[position()=2]/Function]">
<function-def>
<xsl:attribute name="name">
<xsl:value-of select="group[position()=1]/Id/#value" />
<xsl:text>(</xsl:text>
<xsl:apply-templates select="group[position()=2]/Function" />
<xsl:text>)</xsl:text>
</xsl:attribute>
<xsl:copy-of select="group/Function/location" />
<xsl:copy-of select="group/Function/end-location" />
</function-def>
</xsl:template>
<xsl:template match="Function/group[position()=1]/Id">
<xsl:value-of select="#value" />
<xsl:if test="position() != last()"><xsl:text>,</xsl:text></xsl:if>
</xsl:template>
</xsl:stylesheet>
however the condition position() != last() on the last template is not working. Why?
The output renders as:
<?xml version="1.0"?>
<function-def name="foo(a,b,)">...
while it should be:
<?xml version="1.0"?>
<function-def name="foo(a,b)">...
It is working, but not in the way you think....
In your first template, you have this xsl:apply-templates
<xsl:apply-templates select="group[position()=2]/Function" />
But you have no template matching Function, and so XSLT's built-in template rules kicks in, which is this...
<xsl:template match="*|/">
<xsl:apply-templates/>
</xsl:template>
This will select the Group elements, for which again there is no template. Now, when it does <xsl:apply-templates/> this will select all child nodes, which includes the empty text nodes used to indent the XML.
The problem is when you are testing position() = last() you are testing the position of the element in the set of all the child nodes that have been selected, which includes the text nodes. There is an empty text node after the last id, so id may be the last id element, but it is not the last child node.
One solution, is to tell XSLT to strip out empty text nodes, so that id then does become the last child node
<xsl:strip-space elements="*" />
Alternatively, you can add a template matching group, and explicitly select only id nodes
<xsl:template match="Function/group[position()=1]">
<xsl:apply-templates select="Id" />
</xsl:template>

XSLT key only returns a value once

I think I'm missing something obvious here but here goes. I have the below xml and I need to group the KEY nodes of the matched instances together. This is specified by the match attribute and it can contain more than one item number. There can be any number of ITEM nodes and any number of KEY nodes. Also, there is no limit to the depth of the ITEM nodes. And, the matched instances need not be under the same parent. I'm also limited to XSLT 1.0 and the Microsoft parser.
<?xml version="1.0" encoding="utf-8" ?>
<ITEM number='1'>
<ITEM number='2'>
<ITEM number='3' match='5,11'>
<KEY name='key1' value='x' />
<KEY name='key2' value='y' />
<KEY name='key3' value='z' />
<ITEM number ='4' />
</ITEM>
<ITEM number='5' match='3,11'>
<KEY name='key1' value='x' />
<KEY name='key2' value='y' />
<KEY name='key3' value='z' />
</ITEM>
<ITEM number='6' match='10'>
<KEY name='key1' value='x' />
<KEY name='key2' value='y' />
<KEY name='key4' value='a' />
</ITEM>
<ITEM number='7' />
<ITEM number='8'>
<KEY name='key1' value='x' />
</ITEM>
</ITEM>
<ITEM number='9'>
<ITEM number='10' match='6'>
<KEY name='key1' value='x' />
<KEY name='key3' value='z' />
<KEY name='key5' value='b' />
</ITEM>
</ITEM>
<ITEM number='11' match='3,5'>
<KEY name='key2' value='y' />
<KEY name='key3' value='z' />
</ITEM>
</ITEM>
My expected result would look something like this...
<?xml version="1.0" encoding="utf-8" ?>
<Result>
<Group number="1" />
<Group number="2" />
<Group number="3,5,11">
<KEY name='key1' value='x' />
<KEY name='key2' value='y' />
<KEY name='key3' value='z' />
</Group>
<Group number="4" />
<Group number="6,10">
<KEY name='key1' value='x' />
<KEY name='key2' value='y' />
<KEY name='key3' value='z' />
<KEY name='key4' value='a' />
<KEY name='key5' value='b' />
</Group>
<Group number="7" />
<Group number="8">
<KEY name='key1' value='x' />
</Group>
<Group number="9" />
</Result>
What I actually get is...
<?xml version="1.0" encoding="utf-8"?>
<Result>
<Group number="1" />
<Group number="2" />
<Group number="3,5,11">
<KEY name="key1" value="x" />
<KEY name="key2" value="y" />
<KEY name="key3" value="z" />
</Group>
<Group number="4" />
<Group number="6,10">
<KEY name="key4" value="a" />
<KEY name="key5" value="b" />
</Group>
<Group number="7" />
<Group number="8" />
<Group number="9" />
</Result>
I'm using a key and it looks like once I access that particular value from the key function, I cannot access it again. Group number 6,10 should contain all 5 keys but is missing the first 3 which are already present in group number 3,5. Similarly for group number 8, it should contain 1 key. I've used recursion to skip over the matched instances but I don't think there is any issue over there, it seems to be related to the key functionality. I've attached my xslt below, please take a look and tell me what I'm doing wrong. Any tips for performance improvements are also appreciated :)
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
<xsl:output method="xml" indent="yes"/>
<xsl:key name="kKeyByName" match="KEY" use="#name" />
<xsl:template name="ProcessItem">
<!--pItemsList - node set containing items that need to be processed-->
<xsl:param name="pItemsList" />
<!--pProcessedList - string containing processed item numbers in the format |1|2|3|-->
<xsl:param name="pProcessedList" />
<xsl:variable name="vCurrItem" select="$pItemsList[1]" />
<!--Recursion exit condition - check if we have a valid Item-->
<xsl:if test="$vCurrItem">
<xsl:variable name="vNum" select="$vCurrItem/#number" />
<!--Skip processed instances-->
<xsl:if test="not(contains($pProcessedList, concat('|', $vNum, '|')))">
<xsl:element name="Group">
<!--If the item is matched with another item, only the distinct keys of the 2 should be displayed-->
<xsl:choose>
<xsl:when test="$vCurrItem/#match">
<xsl:attribute name="number">
<xsl:value-of select="concat($vNum, ',', $vCurrItem/#match)" />
</xsl:attribute>
<xsl:for-each select="(//ITEM[#number=$vNum or #match=$vNum]/KEY)[generate-id(.)=generate-id(key('kKeyByName', #name)[1])]">
<xsl:apply-templates select="." />
</xsl:for-each>
</xsl:when>
<xsl:otherwise>
<xsl:attribute name="number">
<xsl:value-of select="$vNum" />
</xsl:attribute>
<xsl:apply-templates select="KEY" />
</xsl:otherwise>
</xsl:choose>
</xsl:element>
</xsl:if>
<!--Append processed instances to list to pass on in recursive function-->
<xsl:variable name="vNewList">
<xsl:value-of select="$pProcessedList" />
<xsl:value-of select="concat($vNum, '|')" />
<xsl:if test="$vCurrItem/#match">
<xsl:value-of select="concat($vCurrItem/#match, '|')" />
</xsl:if>
</xsl:variable>
<!--Call template recursively to process the rest of the instances-->
<xsl:call-template name="ProcessItem">
<xsl:with-param name="pItemsList" select="$pItemsList[position() > 1]" />
<xsl:with-param name="pProcessedList" select="$vNewList" />
</xsl:call-template>
</xsl:if>
</xsl:template>
<xsl:template match="KEY">
<xsl:copy>
<xsl:copy-of select="#*|node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<xsl:element name="Result">
<xsl:call-template name="ProcessItem">
<xsl:with-param name="pItemsList" select="//ITEM" />
<xsl:with-param name="pProcessedList" select="'|'" />
</xsl:call-template>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
IF there is only one match or none to each item you can give the following xslt a try:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:output method="xml" indent="yes" />
<xsl:strip-space elements="*"/>
<xsl:key name="kItemNr" match="ITEM" use="#number" />
<xsl:key name="kNumberKey" match="KEY" use="concat(../#number, '|', #name )" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="ITEM">
<xsl:if test="not(preceding::ITEM[#number = current()/#match])" >
<Group>
<xsl:attribute name="number">
<xsl:value-of select="#number"/>
<xsl:if test="#match" >
<xsl:text>,</xsl:text>
<xsl:value-of select="#match"/>
</xsl:if>
</xsl:attribute>
<xsl:variable name="itemNr" select="#number"/>
<xsl:apply-templates select="KEY | key('kItemNr',#match )/KEY[
not (key('kNumberKey', concat($itemNr, '|', #name) ) )] ">
<xsl:sort select="#name"/>
</xsl:apply-templates>
</Group>
</xsl:if>
</xsl:template>
<xsl:template match="/" >
<Result>
<xsl:for-each select="//ITEM[count(. | key('kItemNr',number ) ) = 1 ]" >
<xsl:apply-templates select="." />
</xsl:for-each>
</Result>
</xsl:template>
</xsl:stylesheet>
Which will generate the following output:
<?xml version="1.0"?>
<Result>
<Group number="1"/>
<Group number="2"/>
<Group number="3,5">
<KEY name="key1" value="x"/>
<KEY name="key2" value="y"/>
<KEY name="key3" value="z"/>
</Group>
<Group number="4"/>
<Group number="6,10">
<KEY name="key1" value="x"/>
<KEY name="key2" value="y"/>
<KEY name="key3" value="z"/>
<KEY name="key4" value="a"/>
<KEY name="key5" value="b"/>
</Group>
<Group number="7"/>
<Group number="8">
<KEY name="key1" value="x"/>
</Group>
<Group number="9"/>
</Result>
Update because of changed request:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:output method="xml" indent="yes"/>
<xsl:key name="kItemNr" match="ITEM" use="#number" />
<xsl:template match="#*|node()">
<xsl:copy >
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="ITEM">
<xsl:variable name="matchStr" select=" concat(',', current()/#match, ',')"/>
<xsl:if test="not(preceding::ITEM[ contains($matchStr, concat(',', #number, ',') )])" >
<Group>
<xsl:attribute name="number">
<xsl:value-of select="#number"/>
<xsl:if test="#match" >
<xsl:text>,</xsl:text>
<xsl:value-of select="#match"/>
</xsl:if>
</xsl:attribute>
<xsl:apply-templates select="(KEY |
//ITEM[
contains( $matchStr, concat(',', #number, ',') )
]/KEY[
not((preceding::ITEM[
contains( $matchStr, concat(',', #number, ',') )
] | current() )/KEY/#name = #name)
]) ">
<xsl:sort select="#name"/>
</xsl:apply-templates>
</Group>
</xsl:if>
</xsl:template>
<xsl:template match="/" >
<Result>
<xsl:for-each select="//ITEM[count(. | key('kItemNr',number ) ) = 1 ]" >
<xsl:apply-templates select="." />
</xsl:for-each>
</Result>
</xsl:template>
</xsl:stylesheet>
This may be quite slow for bigger input data but any way.

Get distinct values from xml

My sample xml looks below: I need to get the distinct states from xml. I am using xslt 1.0 in vs 2010 editor.
<?xml version="1.0" encoding="utf-8" ?>
<states>
<node>
<value>2</value>
<state>DE</state>
</node>
<node>
<value>1</value>
<state>DE</state>
</node>
<node>
<value>1</value>
<state>NJ</state>
</node>
<node>
<value>1</value>
<state>NY</state>
</node>
<node>
<value>1</value>
<state>NY</state>
</node>
</states>
My xslt looks like below:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl"
xmlns:user="urn:my-scripts">
<xsl:output method="text" indent="yes"/>
<xsl:key name="st" match="//states/node/state" use="." />
<xsl:variable name="disst">
<xsl:for-each select="//states/node[contains(value,1)]/state[generate-id()=generate-id(key('st',.)[1])]" >
<xsl:choose>
<xsl:when test="(position() != 1)">
<xsl:value-of select="concat(', ',.)" disable-output-escaping="yes"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="." disable-output-escaping="yes"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:variable>
<xsl:template match="/" >
<xsl:value-of disable-output-escaping="yes" select="$disst"/>
</xsl:template>
</xsl:stylesheet>
Output: DE,NJ,NY
My above xml looks good for the above test xml.
If I change the xml as below:
<?xml version="1.0" encoding="utf-8" ?>
<states>
<node>
<value>2</value>
<state>DE</state>
</node>
<node>
<value>1</value>
<state>DE</state>
</node>
<node>
<value>1</value>
<state>NJ</state>
</node>
<node>
<value>1</value>
<state>NY</state>
</node>
<node>
<value>1</value>
<state>NY</state>
</node>
</states>
It in not picking the state DE. Can any one suggest the suitable solution.Thanks in advance.
I need to find out the distinct states from the xml.
The problem here is your use of a predicate in your Muenchian grouping XPath:
[contains(value,1)]
This will often make Muenchian grouping fail to find all of the available distinct values. Instead, you should add the predicate to the key:
<xsl:key name="st" match="//states/node[contains(value, 1)]/state" use="." />
Alternatively, you can apply the predicate inside the grouping statement:
<xsl:apply-templates
select="//states/node
/state[generate-id() =
generate-id(key('st',.)[contains(../value, 1)][1])]" />
Full XSLT (with some improvements):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:user="urn:my-scripts">
<xsl:output method="text" indent="yes"/>
<xsl:key name="st" match="//states/node/state" use="." />
<xsl:variable name="a" select="1" />
<xsl:variable name="disst">
<xsl:apply-templates
select="//states/node
/state[generate-id() =
generate-id(key('st',.)[contains(../value, $a)][1])]" />
</xsl:variable>
<xsl:template match="state">
<xsl:if test="position() > 1">
<xsl:text>,</xsl:text>
</xsl:if>
<xsl:value-of select ="." disable-output-escaping="yes" />
</xsl:template>
<xsl:template match="/" >
<xsl:value-of disable-output-escaping="yes" select="$disst"/>
</xsl:template>
</xsl:stylesheet>
Result when run on your sample XML:
DE,NJ,NY

Merging pairs of nodes based on attribute, new to template matching

Say I have the following XML:
<root>
<tokens>
<token ID="t1">blah</token>
<token ID="t2">blabla</token>
<token ID="t3">shovel</token>
</tokens>
<relatedStuff>
<group gID="s1">
<references tokID="t1"/>
<references tokID="t2"/>
</group>
<group gID="s2">
<references tokID="t3"/>
</group>
</relatedStuff>
</root>
Now, considering that a for-each loop for every token would be pretty inefficient and a bad idea, how would one go about using template matching, to transform this xml into the following?
<s id="everything_merged">
<tok id="t1" gID="s1" >blah</tok>
<tok id="t2" gID="s1" >blabla</tok>
<tok id="t3" gID="s2" >shovel</tok>
</s>
All I want from <s> is the "gID", the gID corresponding to the token in the <tokens>.
<xsl:for-each select="b:root/a:tokens/a:token">
<!-- and here some template matching -->
<xsl:attribute name="gID">
<xsl:value-of select="--correspondingNode's--#gID"/>
</xsl:attribute>
</xsl:for-each>
I'm pretty fuzzy on this sort of thing, so thank you very much for any help!
The following stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<s id="everything_merged">
<xsl:apply-templates select="/root/tokens/token" />
</s>
</xsl:template>
<xsl:template match="token">
<tok id="{#ID}" gID="{/root/relatedStuff/group[
references[#tokID=current()/#ID]]/#gID}">
<xsl:apply-templates />
</tok>
</xsl:template>
</xsl:stylesheet>
Applied to this input (corrected for well-formedness):
<root>
<tokens>
<token ID="t1">blah</token>
<token ID="t2">blabla</token>
<token ID="t3">shovel</token>
</tokens>
<relatedStuff>
<group gID="s1">
<references tokID="t1" />
<references tokID="t2" />
</group>
<group gID="s2">
<references tokID="t3" />
</group>
</relatedStuff>
</root>
Produces:
<s id="everything_merged">
<tok id="t1" gID="s1">blah</tok>
<tok id="t2" gID="s1">blabla</tok>
<tok id="t3" gID="s2">shovel</tok>
</s>
A solution using keys and pure "push-style:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="kgIDfromTokId" match="#gID"
use="../*/#tokID"/>
<xsl:template match="tokens">
<s id="everything_merged">
<xsl:apply-templates/>
</s>
</xsl:template>
<xsl:template match="token">
<tok id="{#ID}" gID="{key('kgIDfromTokId', #ID)}">
<xsl:apply-templates/>
</tok>
</xsl:template>
</xsl:stylesheet>
when applied on the provided XML document:
<root>
<tokens>
<token ID="t1">blah</token>
<token ID="t2">blabla</token>
<token ID="t3">shovel</token>
</tokens>
<relatedStuff>
<group gID="s1">
<references tokID="t1" />
<references tokID="t2" />
</group>
<group gID="s2">
<references tokID="t3" />
</group>
</relatedStuff>
</root>
the wanted, correct result is produced:
<s id="everything_merged">
<tok id="t1" gID="s1">blah</tok>
<tok id="t2" gID="s1">blabla</tok>
<tok id="t3" gID="s2">shovel</tok>
</s>

XSL efficiency problem - need solution

I've got an interesting XSL scenario to run by you guys. So far my solutions seem to be inefficient (noticable increase in transformation time) so thought I'd put it out there.
The scenario
From the following XML we need to get the id of latest news item for each category.
The XML
In the XML I have a list of news items, a list of news categories and a list of item category relationships. Both the item list and item category list may as well be in random order (not date ordered).
<news>
<itemlist>
<item id="1">
<attribute name="title">Great new products</attribute>
<attribute name="startdate">2009-06-13T00:00:00</attribute>
</item>
<item id="2">
<attribute name="title">FTSE down</attribute>
<attribute name="startdate">2009-10-01T00:00:00</attribute>
</item>
<item id="3">
<attribute name="title">SAAB go under</attribute>
<attribute name="startdate">2008-01-22T00:00:00</attribute>
</item>
<item id="4">
<attribute name="title">M&A on increase</attribute>
<attribute name="startdate">2010-05-11T00:00:00</attribute>
</item>
</itemlist>
<categorylist>
<category id="1">
<name>Finance</name>
</category>
<category id="2">
<name>Environment</name>
</category>
<category id="3">
<name>Health</name>
</category>
</categorylist>
<itemcategorylist>
<itemcategory itemid="1" categoryid="2" />
<itemcategory itemid="2" categoryid="3" />
<itemcategory itemid="3" categoryid="1" />
<itemcategory itemid="4" categoryid="1" />
<itemcategory itemid="4" categoryid="2" />
<itemcategory itemid="2" categoryid="2" />
</itemcategorylist>
</news>
What I've tried
Using rtf
<xsl:template match="/">
<!-- for each category -->
<xsl:for-each select="/news/categorylist/category">
<xsl:variable name="categoryid" select="#id"/>
<!-- create RTF item list containing only items in that list ordered by startdate -->
<xsl:variable name="ordereditemlist">
<xsl:for-each select="/news/itemlist/item">
<xsl:sort select="attribute[#name='startdate']" order="descending" data-type="text"/>
<xsl:variable name="itemid" select="#id" />
<xsl:if test="/news/itemcategorylist/itemcategory[#categoryid = $categoryid][#itemid=$itemid]">
<xsl:copy-of select="."/>
</xsl:if>
</xsl:for-each>
</xsl:variable>
<!-- get the id of the first item in the list -->
<xsl:variable name="firstitemid" select="msxsl:node-set($ordereditemlist)/item[position()=1]/#id"/>
</xsl:for-each>
</xsl:template>
Would really appreciate any ideas you have.
Thanks,
Alex
Here is how I would do it:
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
<xsl:output encoding="utf-8" />
<!-- this is (literally) the key to the solution -->
<xsl:key name="kItemByItemCategory" match="item" use="
/news/itemcategorylist/itemcategory[#itemid = current()/#id]/#categoryid
" />
<xsl:template match="/news">
<latest>
<xsl:apply-templates select="categorylist/category" mode="latest" />
</latest>
</xsl:template>
<xsl:template match="category" mode="latest">
<xsl:variable name="self" select="." />
<!-- sorted loop to get the latest news item -->
<xsl:for-each select="key('kItemByItemCategory', #id)">
<xsl:sort select="attribute[#name='startdate']" order="descending" />
<xsl:if test="position() = 1">
<category name="{$self/name}">
<xsl:apply-templates select="." />
</category>
</xsl:if>
</xsl:for-each>
</xsl:template>
<xsl:template match="item">
<!-- for the sake of the example, just copy the node -->
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
The <xsl:key> indexes each news item by the associated category ID. Now you have a simple way of retrieving all the news items that belong to a certain category. The rest is straight-forward.
Output for me:
<latest>
<category name="Finance">
<item id="4">
<attribute name="title">M&A on increase</attribute>
<attribute name="startdate">2010-05-11T00:00:00</attribute>
</item>
</category>
<category name="Environment">
<item id="4">
<attribute name="title">M&A on increase</attribute>
<attribute name="startdate">2010-05-11T00:00:00</attribute>
</item>
</category>
<category name="Health">
<item id="2">
<attribute name="title">FTSE down</attribute>
<attribute name="startdate">2009-10-01T00:00:00</attribute>
</item>
</category>
</latest>
It looks like you should explore <xsl:key>. This effectively creates a hashmap and avoids looping through everything.
update Here is a typical tutorial:
http://www.learn-xslt-tutorial.com/Working-with-Keys.cfm
Your're looping through all items and sorting them by date, before you throw most of them away due to not being in the correct category.
Maybe something like this might be more suitable in your case:
<xsl:variable name="ordereditemlist">
<xsl:for-each select="/news/itemcategorylist/itemcategory[#categoryid = $categoryid]">
<xsl:variable name="itemid" select="#itemid"/>
And continue from there to gather only the news items that you actually require, then sort and copy them.