XSLT - Remove duplicates from mapped results - xslt

This isn't quite the threads on removing duplicates I've found on this forum.
I have a key/value map and I want to remove duplicates from the final results of the mapping.
Source Document:
<article>
<subject code="T020-060"/>
<subject code="T020-010"/>
<subject code="T090"/>
</article>
Mapping:
<xsl:variable name="topicalMap">
<topic MapCode="T020-060">Value 1</topic>
<topic MapCode="T020-010">Value 1</topic>
<topic MapCode="T090">Value 3</topic>
</xsl:variable>
Desired Result:
<article>
<topic>Value 1</topic>
<topic>Value 3</topic>
</article>
XSLT I'm working with (note, it has a testing tags and code to make sure the mapping works):
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="xml" encoding="utf8" indent="yes" exclude-result-prefixes="#all"/>
<xsl:template match="article">
<article>
<xsl:for-each-group select="subject" group-by="$topicalMap/topic[#MapCode = #code]">
<test-group>
<code>Current code: <xsl:value-of select="#code"/></code>
<topic>Current keyword: <xsl:value-of
select="$topicalMap/topic[#MapCode = #code]"/></topic>
</test-group>
</xsl:for-each-group>
<simple-mapping><xsl:apply-templates/></simple-mapping>
</article>
</xsl:template>
<!-- Simple Mapping Topics -->
<xsl:template match="subject">
<xsl:variable name="ArticleCode" select="#code"/>
<topic>
<xsl:value-of select="$topicalMap/topic[#MapCode = $ArticleCode]"/>
</topic>
</xsl:template>
<!-- Keyword Map -->
<xsl:variable name="topicalMap">
<topic MapCode="T020-060">Value 1</topic>
<topic MapCode="T020-010">Value 1</topic>
<topic MapCode="T090">Value 3</topic>
</xsl:variable>
</xsl:stylesheet>
Doing the group-by that way produces nothing. If I duplicate the topics in the source document and do group-by="#code" that works to remove before applying the mapping. But I want to remove resultant duplicate values not duplicate keys.
Simple-mapping stuff is just to show working code.

Use
<xsl:for-each-group select="subject" group-by="$topicalMap/topic[#MapCode = current()/#code]">
<topic>
<xsl:value-of select="current-grouping-key()"/>
</topic>
</xsl:for-each-group>
or better yet
<xsl:key name="map" match="topic" use="#MapCode"/>
<xsl:template match="article">
<article>
<xsl:for-each-group select="subject" group-by="key('map', #code, $topicalMap)">
<topic>
<xsl:value-of select="current-grouping-key()"/>
</topic>
</xsl:for-each-group>
</article>
</xsl:template>

Related

Sorting grouped items using XSLT 1.0

I'be got a rather large XML file that contains a list of vehicle models, their price and a monthly payment price. There is actually loads of other information in there, but those are the salient bits of data I'm interested in.
There are loads of duplicate models in there at different prices/monthly payments. My task, which I have done so far with the help of this forum, is to construct a distinct list of the models (IE. no duplicates), showing the lowest priced vehicle in that model.
The bit I'm stuck on, is I then need to sort this list displaying the lowest monthly payment to the highest monthly payment. The complication being the lowest priced vehicle don't always equal the lowest monthly payment.
My XML looks a bit like this:
<?xml version="1.0" encoding="utf-8"?>
<Dealer>
<Vehicle>
<Model>KA</Model>
<DealerPriceNoFormat>8700.00</DealerPriceNoFormat>
<OptionsFinanceMonthlyPayment>300.50</OptionsFinanceMonthlyPayment>
</Vehicle>
<Vehicle>
<Model>KA</Model>
<DealerPriceNoFormat>10000.50</DealerPriceNoFormat>
<OptionsFinanceMonthlyPayment>270.50</OptionsFinanceMonthlyPayment>
</Vehicle>
<Vehicle>
<Model>Focus</Model>
<DealerPriceNoFormat>12000.00</DealerPriceNoFormat>
<OptionsFinanceMonthlyPayment>340.00</OptionsFinanceMonthlyPayment>
</Vehicle>
<Vehicle>
<Model>KA</Model>
<DealerPriceNoFormat>9910.00</DealerPriceNoFormat>
<OptionsFinanceMonthlyPayment>430.75</OptionsFinanceMonthlyPayment>
</Vehicle>
<Vehicle>
<Model>KUGA</Model>
<DealerPriceNoFormat>23010.00</DealerPriceNoFormat>
<OptionsFinanceMonthlyPayment>550.20</OptionsFinanceMonthlyPayment>
</Vehicle>
<Vehicle>
<Model>Focus</Model>
<DealerPriceNoFormat>15900.00</DealerPriceNoFormat>
<OptionsFinanceMonthlyPayment>430.00</OptionsFinanceMonthlyPayment>
</Vehicle>
</Dealer>
As I said, there is loads of other data in there, but that's the basic structure.
And my XSLT looks like this:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:output method="html" omit-xml-declaration="yes" indent="yes" version="4.0" encoding="iso-8859-1" />
<xsl:key name="by-id" match="Dealer/Vehicle" use="Model"/>
<xsl:template match="Dealer">
<xsl:copy>
<xsl:for-each select="Vehicle[generate-id() = generate-id(key('by-id', Model)[1])]">
<xsl:for-each select="key('by-id', Model)">
<xsl:sort select="DealerPriceNoFormat" data-type="number" order="ascending" />
<xsl:if test="position()=1">
<p>
<xsl:value-of select="Model" /><br />
<xsl:value-of select="DealerPriceNoFormat" /><br />
<xsl:value-of select="OptionsFinanceMonthlyPayment" />
</p>
</xsl:if>
</xsl:for-each>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Like I say, I'm almost there, just can't figure out how to then sort the output list by OptionsFinanceMonthlyPayment.
So the in the case above output would look something like this, showing the cheapest car in each model, but sorted by the monthly payment on the output list:
KA
8700.00
300.50
Focus
12000.00
340.00
KUGA
23010.00
550.20
Thanks in advance.
I would do this in two passes - something like:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
extension-element-prefixes="exsl">
<xsl:key name="vehicle-by-model" match="Vehicle" use="Model"/>
<xsl:template match="/Dealer">
<!-- first-pass -->
<xsl:variable name="groups">
<xsl:for-each select="Vehicle[generate-id() = generate-id(key('vehicle-by-model', Model)[1])]">
<group>
<xsl:for-each select="key('vehicle-by-model', Model)">
<xsl:sort select="DealerPriceNoFormat" data-type="number" order="ascending" />
<xsl:if test="position()=1">
<model><xsl:value-of select="Model" /></model>
<price><xsl:value-of select="DealerPriceNoFormat" /></price>
<pmt><xsl:value-of select="OptionsFinanceMonthlyPayment" /></pmt>
</xsl:if>
</xsl:for-each>
</group>
</xsl:for-each>
</xsl:variable>
<!-- output -->
<html>
<xsl:for-each select="exsl:node-set($groups)/group">
<xsl:sort select="pmt" data-type="number" order="ascending" />
<p>
<xsl:value-of select="model" /><br />
<xsl:value-of select="price" /><br />
<xsl:value-of select="pmt" />
</p>
</xsl:for-each>
</html>
</xsl:template>
</xsl:stylesheet>
Note that this requires a processor that supports a node-set() extension function.

How to get max valued xref's 'rid' except from particular section

Please suggest for how get the maximum 'rid' value from all xrefs except from the 'Online' sections. By identify the max valued 'rid', then need to insert the attribute to those references which are higher to maximum value. Please see required result text.
XML:
<article>
<body>
<sec><title>Sections</title>
<p>The test <xref rid="b1">1</xref>, <xref rid="b2">2</xref>, <xref rid="b3 b4 b5">3-5</xref></p></sec>
<sec><title>Online</title><!--This section's xrefs no need to consider-->
<p>The test <xref rid="b6">6</xref></p>
<sec><title>Other</title>
<p><xref rid="b1">1</xref>, <xref rid="b7 b8">7-8</xref></p>
</sec>
</sec><!--This section's xrefs no need to consider-->
<sec>
<p>Final test test</p>
<sec><title>Third title</title><p>Last text</p></sec>
</sec>
</body>
<bm>
<ref id="b1">The ref01</ref>
<ref id="b2">The ref02</ref>
<ref id="b3">The ref03</ref>
<ref id="b4">The ref04</ref>
<ref id="b5">The ref05</ref>
<ref id="b6">The ref06</ref>
<ref id="b7">The ref07</ref>
<ref id="b8">The ref08</ref>
</bm>
</article>
XSLT:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:variable name="var1"><!--Variable to get the all 'rid's except sec/title contains 'Online' -->
<xsl:for-each select="//xref[not(. is ancestor::sec[title[contains(., 'Online')]]/descendant-or-self)]/#rid">
<!--xsl:for-each select="//xref/#rid[not(contains(ancestor::sec/title, 'Online'))]"--><!--for this xpath, error is : "XPTY0004: A sequence of more than one item is not allowed as the first argument" -->
<!--xsl:for-each select="//xref/#rid[not(contains(ancestor::sec[1]/title, 'Online')) and not(contains(ancestor::sec[2]/title, 'Online'))]"--><!--for this xpath we are getting the required result, but there may be several nesting of 'sec's -->
<xsl:choose>
<xsl:when test="contains(., ' ')">
<xsl:for-each select="tokenize(., ' ')">
<a><xsl:value-of select="."/></a>
</xsl:for-each>
</xsl:when>
<xsl:otherwise><a><xsl:value-of select="."/></a></xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:variable>
<xsl:variable name="varMax1">
<xsl:for-each select="$var1/a">
<xsl:sort select="substring-after(., 'b')" order="descending" data-type="number"/>
<a><xsl:value-of select="."/></a>
</xsl:for-each>
</xsl:variable>
<xsl:variable name="varMax"><!--Variable to get max valued RID -->
<xsl:value-of select="substring-after($varMax1/a[1], 'b')"/>
</xsl:variable>
<xsl:template match="#*|node()">
<xsl:copy><xsl:apply-templates select="#*|node()"/></xsl:copy>
</xsl:template>
<xsl:template match="ref">
<xsl:variable name="varID"><xsl:value-of select="substring-after(#id, 'b')"/></xsl:variable>
<xsl:choose>
<xsl:when test="number($varMax) lt number($varID)">
<xsl:copy>
<xsl:apply-templates select="#*"/>
<xsl:attribute name="MoveRef">yes</xsl:attribute>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:when>
<xsl:otherwise>
<xsl:copy><xsl:apply-templates select="#*|node()"/></xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Required result:
<article>
<body>
<sec><title>Sections</title>
<p>The test <xref rid="b1">1</xref>, <xref rid="b2">2</xref>, <xref rid="b3 b4 b5">3-5</xref></p></sec>
<sec><title>Online</title><!--This section's xrefs no need to consider-->
<p>The test <xref rid="b6">6</xref></p>
<sec><title>Other</title>
<p><xref rid="b1">1</xref>, <xref rid="b7">7</xref>, <xref rid="b8">8</xref></p>
</sec>
</sec><!--This section's xrefs no need to consider-->
<sec>
<p>Final test test</p>
<sec><title>Third title</title><p>Last text</p></sec>
</sec>
</body>
<bm>
<ref id="b1">The ref01</ref>
<ref id="b2">The ref02</ref>
<ref id="b3">The ref03</ref>
<ref id="b4">The ref04</ref>
<ref id="b5">The ref05</ref>
<ref id="b6" MoveRef="yes">The ref06</ref>
<ref id="b7" MoveRef="yes">The ref07</ref>
<ref id="b8" MoveRef="yes">The ref08</ref>
</bm>
</article>
Here consider number 5 for 'b5' rid, 6 for 'b6'.... (Because alphanumeric)
Perhaps you can take a different approach rather than trying to find the maximum rid attribute that is not in an "online" section. Not least because it is not entirely clear what the maximum is when you are dealing with an alphanumeric string.
Instead, you could define a key to look up elements in the "online" section by their name
<xsl:key name="online" match="sec[title = 'Online']//*" use="name()" />
And then, another key, to look up the xref elements that occur in other sections
<xsl:key name="other" match="xref[not(ancestor::sec/title = 'Online')]" use="name()" />
Then, you can write a template to math the ref elements, and use an xsl:if to determine whether to add MoveRef attribute to it:
<xsl:variable name="id" select="#id" />
<xsl:if test="key('online', 'xref')[tokenize(#rid, ' ')[. = $id]] and not(key('other', 'xref')[tokenize(#rid, ' ')[. = $id]])">
Try this much shorter XSLT
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="xml" indent="yes" />
<xsl:key name="online" match="sec[title = 'Online']//*" use="name()" />
<xsl:key name="other" match="xref[not(ancestor::sec/title = 'Online')]" use="name()" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="ref">
<ref>
<xsl:variable name="id" select="#id" />
<xsl:if test="key('online', 'xref')[tokenize(#rid, ' ')[. = $id]] and not(key('other', 'xref')[tokenize(#rid, ' ')[. = $id]])">
<xsl:attribute name="MoveRef" select="'Yes'" />
</xsl:if>
<xsl:apply-templates select="#*|node()"/>
</ref>
</xsl:template>
</xsl:stylesheet>
You can actually amend the ref template to put the condition in the template match, if you wanted...
<xsl:template match="ref[key('online', 'xref')[tokenize(#rid, ' ')[. = current()/#id]] and not(key('other', 'xref')[tokenize(#rid, ' ')[. = current()/#id]])]">
<ref MoveRef="Yes">
<xsl:apply-templates select="#*|node()"/>
</ref>
</xsl:template>

Extract text from "para" tag with embedded "para" children?

I'm using Altova's command-line xml processor on Windows to process a Help & Manual xml file. Help & Manual is help authoring software.
I'm extracting the text content from it using the following xslt. Specifically, I'm having an issue with the final para rule:
<?xml version='1.0'?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />
<xsl:strip-space elements="*" />
<xsl:template match="para[#styleclass='Heading1']">
<xsl:text>====== </xsl:text>
<xsl:value-of select="." />
<xsl:text> ======
</xsl:text>
</xsl:template>
<xsl:template match="para[#styleclass='Heading2']">
<xsl:text>===== </xsl:text>
<xsl:value-of select="." />
<xsl:text> =====
</xsl:text>
</xsl:template>
<xsl:template match="para">
<xsl:value-of select="." />
<xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="toggle">
<xsl:text>**</xsl:text>
<xsl:apply-templates />
<xsl:text>**
</xsl:text>
</xsl:template>
<xsl:template match="title" />
<xsl:template match="topic">
<xsl:apply-templates select="body" />
</xsl:template>
<xsl:template match="body">
<xsl:text>Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
</xsl:text>
<xsl:apply-templates />
</xsl:template>
</xsl:stylesheet>
I've run into an issue with the extraction of text from certain paragraph elements. Take for example this xml:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="../helpproject.xsl" ?>
<topic template="Default" lasteditedby="tlilley" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../helpproject.xsd">
<title translate="true">New Installs</title>
<keywords>
<keyword translate="true">Regional and Language Options</keyword>
</keywords>
<body>
<header>
<para styleclass="Heading1"><text styleclass="Heading1" translate="true">New Installs</text></para>
</header>
<para styleclass="Normal"><table rowcount="1" colcount="2" style="width:100%; cell-padding:6px; cell-spacing:0px; page-break-inside:auto; border-width:1px; border-spacing:0px; cell-border-width:0px; border-color:#000000; border-style:solid; background-color:#fffff0; head-row-background-color:none; alt-row-background-color:none;">
<tr style="vertical-align:top">
<td style="vertical-align:middle; width:96px; height:103px;">
<para styleclass="Normal" style="text-align:center;"><image src="books.png" scale="100.00%" styleclass="Image Caption"></image></para>
</td>
<td style="vertical-align:middle; width:1189px; height:103px;">
<para styleclass="Callouts"><text styleclass="Callouts" style="font-weight:bold;" translate="true">Documentation Convention</text></para>
<para styleclass="Callouts"><text styleclass="Callouts" translate="true">To make the examples concrete, we refer to the </text><var styleclass="Callouts">Add2Exchange</var><text styleclass="Callouts" translate="true"> Service Account as "zAdd2Exchange" throughout this document.  If your Service Account name is different, substitute that value for "zAdd2Exchange" in all commands and examples.  If you have named your account according to the recommended "zAdd2Exchange", then you may cut and paste any given commands as is.</text></para>
</td>
</tr>
</table></para>
</body>
</topic>
When the xslt is run on that paragraph, it pulls the text out but does so at the top paragraph element. The transform is supposed to add a pair of newlines to all extracted paragraphs, but doesn't have a chance to do so on the embedded <para> elements because the text is extracted at the parent para element.
Note that I don't care about the table tags, I just want to strip those.
Is there a way to construct the para rule so that it properly extracts the directly-owned text of a para element, as well as the text of any children para's, such that each extracted chunk gets the rule's newlines in the output text?
I think I've found the answer. Instead of value-of with the last para rule, I'm using apply-templates instead and that seems to catch them all.

XSLT matching PAGEID to an element ID

How would I match two separate numbers in an XML document? There are multiple <PgIndexElementInfo> elements in my XML document, each representing a different navigation element, each with a unique <ID>. Later in the document a <PageID> specifies a number that sometimes matches an <ID> used above. How could I go about matching the <PageID> to the <ID> specified above?
<Element>
<Content>
<PgIndexElementInfo>
<ElementData>
<Items>
<PgIndexElementItem>
<ID>1455917</ID>
</PgIndexElementItem>
</Items>
</ElementData>
</PgIndexElementInfo>
</Content>
</Element>
<Element>
<Content>
<CustomElementInfo>
<PageID>1455917</PageID>
</CustomElementInfo>
</Content>
</Element>
EDIT:
I added the solution below to my code. The xsl:apply-templates that is present is used to recreate the nested lists that are lost between HTML and XML. What I now need to do is match the PageID to the ID of a <PgIndexElementItem> and add a CSS class to the <ul> it is a part of. I hope that makes sense.
<xsl:key name="kIDByValue" match="ID" use="."/>
<xsl:template match="PageID[key('kIDByValue',.)]">
<xsl:apply-templates select="//PgIndexElementItem[not(contains(Description, '.'))]" />
</xsl:template>
<xsl:template match="PgIndexElementItem">
<li>
<xsl:value-of select="Title"/>
<xsl:variable name="prefix" select="concat(Description, '.')"/>
<xsl:variable name="childOptions"
select="../PgIndexElementItem[starts-with(Description, $prefix)
and not(contains(substring-after(Description, $prefix), '.'))]"/>
<xsl:if test="$childOptions">
<ul>
<xsl:apply-templates select="$childOptions" />
</ul>
</xsl:if>
</li>
</xsl:template>
The XSLT way for dealing with cross references is with keys.
Matching: A rule matching every PageID element that it has been referenced by an ID element.
<xsl:key name="kIDByValue" match="ID" use="."/>
<xsl:template match="PageID[key('kIDByValue',.)]">
<!-- Template content -->
</xsl:template>
Selecting: A expression selecting every PageID element with specific value.
<xsl:key name="kPageIDByValue" match="PageID" use="."/>
<xsl:template match="ID">
<xsl:apply-templates select="key('kPageIDByValue',.)"/>
</xsl:template>

XSLT - Grouping same items

I have the following XML:
<Info>
<Name>Dan</Name>
<Age>24</Age>
</Info>
<Info>
<Name>Tom</Name>
<Age>15</Age>
</Info>
<Info>
<Name>Dan</Name>
<Age>24</Age>
</Info>
<Info>
<Name>James</Name>
<Age>18</Age>
</Info>
And I need to produce the following HTML:
<ul class="data">
<li>Dan</li>
<li>Dan</li>
</ul>
<ul class="data">
<li>James</li>
</ul>
<ul class="data">
<li>Tom</li>
<li>Tom</li>
</ul>
As well as producing the output it needs to sort based on the Name also. Any help appreciated, started by looking at group-by but couldnt work out how to get it finished:
Pretty sure its wrong?
<xsl:for-each-group select="Info" group-by="#Name">??????
<xsl:for-each-group select="current-group()" sort-by="#Name">
I don't have an XSLT 2.0 parser, but to do it in XSLT 1.0 at least you could use Muenchian Grouping to do this...
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output encoding="utf-8" method="html" version="1.0"/>
<xsl:key name="Names" match="Name" use="."/>
<xsl:template match="/">
<xsl:for-each select="//Info[generate-id(Name) = generate-id(key('Names', Name)[1])]">
<xsl:sort select="Name" />
<ul class="data">
<xsl:for-each select="key('Names', Name)">
<li><xsl:value-of select="." /></li>
</xsl:for-each>
</ul>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
I'm not sure why your result has two 'Tom' elements, I assume you have an extra node in the XML that you didn't provide in your sample.
Anyway, the XSLT would look something like this:
<xsl:for-each-group select="Info" group-by="Name">
<xsl:sort select="current-grouping-key()"/>
<ul class="data">
<xsl:for-each select="current-group()/Name">
<li><xsl:value-of select="." /></li>
</xsl:for-each>
</ul>
</xsl:for-each-group>
I don't have an XSLT 2.0 parser handy to test it, but I think that should work.
I don't have an xslt 2.0 parser to test this, but at the very least you will need to change "#Name" to just "Name", since it is a subelement not an attribute.