XSLT Muenchian Grouping on different elements based on a common attribute - xslt

I am given XML similar to the following that I need to process.
<root>
<Header/>
<Customer id="1" date="13/04/2014"/>
<Account id="1" date="14/04/2014"/>
<Account id="1" date="01/06/2015"/>
<Address id="1" date="14/04/2014"/>
<Customer id="2" date="12/08/2015"/>
<Account id="2" date="13/08/2015"/>
<Address id="2" date="13/08/2015"/>
<Address id="2" date="03/09/2015"/>
<Address id="2" date="27/01/2017"/>
<Customer id="3" date="04/10/2015"/>
<Customer id="3" date="01/02/2017"/>
<Account id="3" date="05/10/2015"/>
<Address id="3" date="08/10/2015"/>
<Address id="3" date="03/09/2016"/>
</root>
All of the nodes have more attributes but I stripped them off. Each element has an id and a date.If there are duplicate elements that have the same id then the one with the most recent date is considered valid and the older one should be ignored.
If the older ones can be stripped out at the same time I would like to output it into something like this.
<Customers>
<Customer id="1">
<Account/>
<Address/>
</Customer>
<Customer id="2">
<Account/>
<Address/>
</Customer>
<Customer id="3">
<Account/>
<Address/>
</Customer>
</Customers>
If not then it is fine to process the file in two transforms (one to group them by customer id and each customer have multiple Account/Address fields, then in the other transform remove the older entries)
The source XML has close to a million entries so performance is an issue. The transform taking a few minutes is fine, but any more than 15 will not work.
I currently have the following XSLT
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:key name="nodes-by-id" match="//root/*" use="#id"/>
<xsl:template match="root">
<Customers>
<xsl:for-each select="*[count(. | key('nodes-by-id', #id)[1]) = 1]">
<xsl:variable name="current-grouping-key" select="#id"/>
<xsl:variable name="current-group" select="key('nodes-by-id', $current-grouping-key)"/>
<Customer>
<xsl:attribute name="id">
<xsl:value-of select="$current-grouping-key"/>
</xsl:attribute>
<CustomerElements>
<xsl:for-each select="$current-group/Customer">
<CustomerElement>
<xsl:attribute name="date">
<xsl:value-of select="#date"/>
</xsl:attribute>
</CustomerElement>
</xsl:for-each>
</CustomerElements>
<xsl:apply-templates select="$current-group"/>
</Customer>
</xsl:for-each>
</Customers>
</xsl:template>
</xsl:stylesheet>
Currently this just tries to group all of the elements by their id, then output all of the Customer elements. I get the following:
<Customers>
<Customer id="">
<CustomerElements/>
</Customer>
<Customer id="1">
<CustomerElements/>
</Customer>
<Customer id="2">
<CustomerElements/>
</Customer>
<Customer id="3">
<CustomerElements/>
</Customer>
</Customers>
I get the customer with the blank ID because I don't ignore the header row. My real question is why does the $current-group variable not contain any elements?
Also any tips on how to ignore the header row, and to filter out entries with the older dates.

I got everything sorted. This is a segment of the XSLT I used. More info in the XML comments.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:key name="nodes-by-id" match="//root/*" use="#id"/>
<xsl:template match="PR-030">
<CustomerMeters>
<!-- Using select="Customer[cou.... instead of select="*[cou... will couse it to ignore the header. However it requres
the Customer element to be the first element for the icp in the xml. -->
<xsl:for-each select="Customer[count(. | key('nodes-by-id', #id)[1]) = 1]">
<xsl:variable name="current-grouping-key" select="#id"/>
<xsl:variable name="current-group" select="key('nodes-by-id', $current-grouping-key)"/>
<xsl:variable name="current-group-sorted">
<!-- If we sort all nodes by date order, then we can fetch the first Address/Customer/etc... from this group and we will have the latest-->
<xsl:for-each select="$current-group">
<!-- year -->
<xsl:sort select="substring(#date, 7, 4)" order="descending" data-type="number"/>
<!-- month -->
<xsl:sort select="substring(#date, 4, 2)" order="descending" data-type="number"/>
<!-- day -->
<xsl:sort select="substring(#date, 1, 2)" order="descending" data-type="number"/>
<xsl:copy-of select="current()" />
</xsl:for-each>
</xsl:variable>
<Customer>
<!-- In here I can get what I want from the current-group-sorted varaible-->
<!-- Because they are in date order I can just get the first occurance and it will be the most recent-->
<someField>
<xsl:value-of select="$current-group-sorted/*[self::Account][1]/#someAttribute"/>
</someField>
</Customer>
</xsl:for-each>
</CustomerMeters>
</xsl:template>
</xsl:stylesheet>

Related

Verticalize XML using XSLT

I am trying to implement an at first looking simple transformation but whatever I have tried has been failed.
The XML is generated from a fixed length record and have the below format.
<?xml version="1.0" encoding="UTF-8"?>
<record>
<no_of_records>30</no_of_records>
<cust_lastname_1>Smith</cust_lastname_1>
<cust_name_1>John</cust_name_1>
<cust_id_1>X45</cust_id_1>
<cust_lastname_2>George</cust_lastname_2>
<cust_name_2>Michael</cust_name_2>
<cust_id_2>X76</cust_id_2>
<cust_lastname_3>Ria</cust_lastname_3>
<cust_name_3>Chris</cust_name_3>
<cust_id_3>C87</cust_id_3>
...
</record>
The no_of_records indicates how many _X suffixed elements contains each record and because of its fix length origin has a defined maximum.
I want to transform it to a “verticalized” form resempling the below.
<record>
<customer num="1">
<lastname>Smith</lastname>
<name>John</name>
<id>X45</id>
</customer>
<customer num="2">
<lastname>George</lastname>
<name>Michael</name>
<id>X76</id>
</customer>
<customer num="3">
<lastname>Ria</lastname>
<name>Chris</name>
<id>C87</id>
...
</customer>
</record>
Any help would greatly appreciated.
In XSLT 2.0, you want something like
<xsl:for-each-group select="*" group-starting-with="*[starts-with(local-name(), 'cust_lastname']">
<customer num="{position()}">
<xsl:apply-templates select="current-group()"/>
</customer>
</xsl:for-each-group>
....
<xsl:template match="*[starts-with(local-name(), 'cust')]">
<xsl:element name="{replace(local-name(), 'cust_(.*?)_[0-9]+', '$1')}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
The solution from #Michael Kay works fine. Thank you !
XML
<?xml version="1.0" encoding="UTF-8"?>
<record>
<no_of_records>3</no_of_records>
<cust_lastname_1>Smith</cust_lastname_1>
<cust_name_1>John</cust_name_1>
<cust_id_1>X45</cust_id_1>
<cust_lastname_2>George</cust_lastname_2>
<cust_name_2>Michael</cust_name_2>
<cust_id_2>X76</cust_id_2>
<cust_lastname_3>Ria</cust_lastname_3>
<cust_name_3>Chris</cust_name_3>
<cust_id_3>C87</cust_id_3>
</record>
XSLT
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="record">
<records>
<xsl:for-each-group select="*[starts-with(local-name(), 'cust_')]"
group-starting-with="*[starts-with(local-name(), 'cust_lastname')]">
<customer num="{position()}">
<xsl:apply-templates select="current-group()"/>
</customer>
</xsl:for-each-group>
</records>
</xsl:template>
<xsl:template match="*[starts-with(local-name(), 'cust')]">
<xsl:element name="{replace(local-name(), 'cust_(.*?)_[0-9]+', '$1')}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Result
<?xml version="1.0" encoding="UTF-8"?>
<records>
<customer num="1">
<lastname>Smith</lastname>
<name>John</name>
<id>X45</id>
</customer>
<customer num="2">
<lastname>George</lastname>
<name>Michael</name>
<id>X76</id>
</customer>
<customer num="3">
<lastname>Ria</lastname>
<name>Chris</name>
<id>C87</id>
</customer>
</records>

how to remove duplicate elements

I need to extract customer names with "Name" and save it in a variable by removing duplicate names as follows. Input is any dummy xml
like response variable should have only this
<customer name="Name">John; Kevin; Leon; Adam</customer>
used this stylesheet
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:template match="/">
<xsl:variable name="request">
<customers>
<customer name="Address">1 Doe Place</customer>
<customer name="State">OH</customer>
<customer name="Name">John</customer>
<customer name="Name">Kevin</customer>
<customer name="Name">Leon</customer>
<customer name="Name">Adam</customer>
<customer name="Name">Leon</customer>
<customer name="Name">Adam</customer>
<customer name="Name">John</customer>
<customer name="city">Columbus</customer>
</customers>
</xsl:variable>
<xsl:variable name="response" >
<xsl:for-each select="$request/customers/customer[#name = 'Name']">
<xsl:copy-of select="./text()"/>
<xsl:if test="position()!=last()">; </xsl:if>
</xsl:for-each>
</xsl:variable>
<xsl:copy-of select="$response"/>
</xsl:template>
</xsl:stylesheet>
This generates
<customer name="Name">John; Kevin; Leon; Adam; Leon; Adam; John</customer>
but is not removing duplicate names

Sorting and moving elements into a new element

Even with all the good tips on this site, I still have some trouble with my xslt. I'm pretty new to it. I have this source file:
<?xml version="1.0" encoding="utf-8"?>
<file>
<id>1</id>
<row type="A">
<name>ABC</name>
</row>
<row type="B">
<name>BCA</name>
</row>
<row type="A">
<name>CBA</name>
</row>
</file>
and I want to add an element and sort the rows on type, to get this result
<file>
<id>1</id>
<details>
<row type="A">
<name>ABC</name>
</row>
<row type="A">
<name>CBA</name>
</row>
<row type="B">
<name>BCA</name>
</row>
</details>
</file>
I'm able to sort the rows using this:
<xsl:template match="file">
<xsl:copy>
<xsl:apply-templates select="#*/row"/>
<xsl:apply-templates>
<xsl:sort select="#type" data-type="text"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
and I'm able to move the rows using this
<xsl:template match="file">
<xsl:copy>
<xsl:copy-of select="#*" />
<xsl:apply-templates select="*[not(name(.)='row')]" />
<details>
<xsl:apply-templates select="row" />
</details>
</xsl:copy>
</xsl:template>
but I'm not able to produce the correct answer when I try to combine them. Hopefully I understand more of XSLT when I see how things are combined. Since I'm creating a new element <details>, I think the sorting has to be done before the creation of the new <details> element. I have to use xslt 1.0.
Something like this seems to work:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="file">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:copy-of select="row[1]/preceding-sibling::*" />
<details>
<xsl:for-each select="row">
<xsl:sort select="#type" data-type="text"/>
<xsl:copy-of select="."/>
</xsl:for-each>
</details>
<xsl:copy-of select="row[last()]/following-sibling::*" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Here is the result I got:
<?xml version="1.0" encoding="utf-8"?>
<file>
<id>1</id>
<details>
<row type="A">
<name>ABC</name>
</row>
<row type="A">
<name>CBA</name>
</row>
<row type="B">
<name>BCA</name>
</row>
</details>
</file>

xslt on distinct nodes?

I have the following schema:
<parent>
<child id="1" name="Child 1 Version 1" />
</parent>
<parent>
<child id="2" name="Child 2 Version 1" />
</parent>
<parent>
<child id="1" name="Child 1 Version 2" />
</parent>
I want to handle only the last node for each id. Below is what I have tried based on some reading:
<xsl:for-each select="//parent/child">
<xsl:sort select="#id"/>
<xsl:if test="not(#id=following-sibling::*/#id)">
<xsl:element name="child">
<xsl:value-of select="#name"/>
</xsl:element>
</xsl:if>
</xsl:for-each>
But it does not seem to work. My output still contains all three elements. Any ideas on what I can do to correct my issue?
That I want to only handle the last
node for each id. Below is what I have
tried based on some reading:
<xsl:for-each select="//parent/child">
<xsl:sort select="#id"/>
<xsl:if test="not(#id=following-sibling::*/#id)">
<xsl:element name="child">
<xsl:value-of select="#name"/>
</xsl:element>
</xsl:if>
</xsl:for-each>
But it does not seem to work. My
output still contains all three of the
elements. Any ideas on what I can do
to correct my issue?
The problem with this code is that even though the nodes are in a sorted node-set, their following-sibling s are still the ones in the document.
In order for this code to work, one would first create an entirely new document in which the nodes are sorted in the desired way, then (in XSLT 1.0 it is necessary to use the xxx:node-set() extension on the produced RTF to make it an ordinary XML document) on this document the nodes have their siblings as desired.
Solution:
This transformation presents one possible XSLT 1.0 solution that does not require the use of extension functions:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="kchildById" match="child" use="#id"/>
<xsl:template match="/*">
<t>
<xsl:apply-templates select=
"*/child[generate-id()
=
generate-id(key('kchildById',
#id)[last()]
)
]
"/>
</t>
</xsl:template>
<xsl:template match="child">
<child>
<xsl:value-of select="#name"/>
</child>
</xsl:template>
</xsl:stylesheet>
when applied on the provided XML fragment (wrapped in a top element to become well-formed XML document and adding a second version for id="2"):
<t>
<parent>
<child id="1" name="Child 1 Version 1" />
</parent>
<parent>
<child id="2" name="Child 2 Version 1" />
</parent>
<parent>
<child id="1" name="Child 1 Version 2" />
</parent>
<parent>
<child id="2" name="Child 2 Version 2" />
</parent>
</t>
produces the wanted result:
<t>
<child>Child 1 Version 2</child>
<child>Child 2 Version 2</child>
</t>
Do note: the use of the Muenchian method for grouping.
This stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:key name="kParentByChildId" match="parent" use="child/#id"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="parent[count(.|key('kParentByChildId',
child/#id)[last()]) != 1]"/>
</xsl:stylesheet>
Output:
<root>
<parent>
<child id="2" name="Child 2 Version 1"></child>
</parent>
<parent>
<child id="1" name="Child 1 Version 2"></child>
</parent>
</root>
Note. Grouping by #id, selecting last of the group.
Edit: Just in case this is confusing. Above stylesheet means: copy everything execpt those child not having the last #id of the same kind. So, it's not selecting the last of the group, but as reverse logic, striping not last in the group.
Second. Why yours is not working? Well, because of the following-sibling axis. Your method for finding the first of a kind is from an old time where there was few processor implementing keys. Now those days are gone.
So, this stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:template match="t">
<xsl:for-each select="parent/child">
<xsl:sort select="#id"/>
<xsl:if test="not(#id=following::child/#id)">
<xsl:element name="child">
<xsl:value-of select="#name"/>
</xsl:element>
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Output:
<child>Child 1 Version 2</child>
<child>Child 2 Version 1</child>
Note: following axis, because child elements have not siblings.

Not IN equivalent in XPath expression

I am working on an XSL development and I am in need of knowing the NOT IN equivalent in XPATH.
I am presenting the XML and XSL in the simplest format which would be understandable to all.
<?xml-stylesheet type="text/xsl" href="XSL.xsl"?>
<Message>
<Customers>
<Customer pin="06067">1</Customer>
<Customer pin="06068">2</Customer>
<Customer pin="06069">3</Customer>
<Customer pin="06070">4</Customer>
<Customer pin="06072">5</Customer>
</Customers>
<Addresses>
<Address pin1="06067">A</Address>
<Address pin1="06068">B</Address>
<Address pin1="06069">C</Address>
</Addresses>
</Message>
XSL
<xsl:template match="/Message">
<html>
<body>
<h4>Existing Customers</h4>
<table>
<xsl:apply-templates select="//Customers/Customer[#pin = //Addresses/Address/#pin1]"></xsl:apply-templates>
</table>
<h4>New Customers</h4>
<table>
<!--This place need to be filled with new customers-->
</table>
</body>
</html>
</xsl:template>
<xsl:template match="Customer" name="Customer">
<xsl:variable name="pin" select="./#pin"></xsl:variable>
<tr>
<td>
<xsl:value-of select="."/>
<xsl:text> is in </xsl:text>
<xsl:value-of select="//Addresses/Address[#pin1=$pin]"/>
</td>
</tr>
</xsl:template>
In the above XSLT, under the commented area, i need to match and display the customers who's address is not existing in the Addresses/Address node.
Please help find an XPath expression that would match the Customers who are NOT IN the Addresses Node set. (Any alternate could also help)
In XPath 1.0:
/Message/Customers/Customer[not(#pin=/Message/Addresses/Address/#pin1)]
An alternative to the good answer by #Alejandro, which I upvoted, is the following transformation, which uses keys and will be more efficient if the number of existing customers is big:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:key name="kexistingByPin"
match="Address" use="#pin1"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<xsl:apply-templates select=
"*/*/Customer[not(key('kexistingByPin', #pin))]"/>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the provided XML document:
<Message>
<Customers>
<Customer pin="06067">1</Customer>
<Customer pin="06068">2</Customer>
<Customer pin="06069">3</Customer>
<Customer pin="06070">4</Customer>
<Customer pin="06072">5</Customer>
</Customers>
<Addresses>
<Address pin1="06067">A</Address>
<Address pin1="06068">B</Address>
<Address pin1="06069">C</Address>
</Addresses>
</Message>
the wanted, correct answer is produced:
<Customer pin="06070">4</Customer>
<Customer pin="06072">5</Customer>