Transform XML - Form a proper record - xslt

I am trying to create an XSLT to transform an XML document and having trouble with identifying the record boundaries. Below is my xml
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<mheader>
<mid>1</mid>
<mname>mn</mname>
</mheader>
<cheader>
<cid>1</cid>
<cname>cn</cname>
</cheader>
<lheader>
<lid>1</lid>
<lname>ln</lname>
</lheader>
<aheader>
<aid>1</aid>
<aname>an</aname>
</aheader>
<pos>
<pid>1</pid>
<pname>pay</pname>
</pos>
<pos>
<pid>2</pid>
<pname>pay1</pname>
</pos>
<mheader>
<mid>2</mid>
<mname>mh1</mname>
</mheader>
<cheader>
<cid>2</cid>
<cname>ch1</cname>
</cheader>
<lheader>
<lid>2</lid>
<lname>lh1</lname>
</lheader>
<aheader>
<aid>2</aid>
<aname>ah1</aname>
</aheader>
<pos>
<pid>1</pid>
<pname>pay</pname>
</pos>
<pos>
<pid>2</pid>
<pname>pay3</pname>
</pos>
<pos>
<pid>3</pid>
<pname>pay4</pname>
</pos>
</catalog>
I have to transform my xml like the one below
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<record>
<mheader>
<mid>1</mid>
<mname>mn</mname>
</mheader>
<cheader>
<cid>1</cid>
<cname>cn</cname>
</cheader>
<lheader>
<lid>1</lid>
<lname>ln</lname>
</lheader>
<aheader>
<aid>1</aid>
<aname>an</aname>
</aheader>
<pos>
<pid>1</pid>
<pname>pay</pname>
</pos>
<pos>
<pid>2</pid>
<pname>pay1</pname>
</pos>
</record>
<record>
<mheader>
<mid>2</mid>
<mname>mh1</mname>
</mheader>
<cheader>
<cid>2</cid>
<cname>ch1</cname>
</cheader>
<lheader>
<lid>2</lid>
<lname>lh1</lname>
</lheader>
<aheader>
<aid>2</aid>
<aname>ah1</aname>
</aheader>
<pos>
<pid>1</pid>
<pname>pay</pname>
</pos>
<pos>
<pid>2</pid>
<pname>pay3</pname>
</pos>
<pos>
<pid>3</pid>
<pname>pay4</pname>
</pos>
</record>
</catalog>
A record ideally should start from the tag mheader and ends at the last POS tag.
This is what i have tried till now
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<record>
<xsl:apply-templates select="catalog/mheader"/>
<xsl:apply-templates select="catalog/cheader"/>
<xsl:apply-templates select="catalog/lheader"/>
<xsl:apply-templates select="catalog/aheader"/>
<xsl:apply-templates select="catalog/pos"/>
</record>
</xsl:template>
</xsl:stylesheet>
Any ideas on how to form a proper record here in this case?

<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="catalog">
<xsl:copy>
<xsl:for-each-group select="*" group-starting-with="mheader">
<Record>
<xsl:copy-of select="current-group()"/>
</Record>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Check it.

Related

Split large XML to smaller chunks by node count using XSLT

I have a requirement where we are getting a large XML file and I need to transform on small chunks
below is the XML sample with 4 records, I have to transform the XML so I am able to group them in chunks of 2.
<!-- Original XML-->
<EmpDetails>
<Records>
<EmpID>1</EmpID>
<Age>20</Age>
</Records>
<Records>
<EmpID>2</EmpID>
<Age>21</Age>
</Records>
<Records>
<EmpID>3</EmpID>
<Age>22</Age>
</Records>
<Records>
<EmpID>4</EmpID>
<Age>23</Age>
</Records>
</EmpDetails>
<!-- Expected XML-->
<EmpDetails>
<Split>
<Records>
<EmpID>1</EmpID>
<Age>20</Age>
</Records>
<Records>
<EmpID>2</EmpID>
<Age>21</Age>
</Records>
</Split>
<Split>
<Records>
<EmpID>3</EmpID>
<Age>22</Age>
</Records>
<Records>
<EmpID>4</EmpID>
<Age>23</Age>
</Records>
</Split>
</EmpDetails>
I tried few things including below without success.
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<EmpDetails>
<xsl:for-each select="/EmpDetails/Records">
<Split>
<Records>
<EmpID>
<xsl:value-of select="EmpID"/>
</EmpID>
<Age>
<xsl:value-of select="Age"/>
</Age>
</Records>
</Split>
</xsl:for-each>
</EmpDetails>
</xsl:template>
</xsl:stylesheet>
Thanks
Yatan
group them in chunks of 2.
This could be done simply by:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/EmpDetails">
<xsl:copy>
<xsl:for-each select="Records[position() mod 2 = 1]">
<Split>
<xsl:copy-of select=". | following-sibling::Records[1]"/>
</Split>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Added:
To divide the records into groups of 200, you can do:
...
<xsl:for-each select="Records[position() mod 200 = 1]">
<Split>
<xsl:copy-of select=". | following-sibling::Records[position() < 200]"/>
</Split>
</xsl:for-each>
...
In XSLT 2.0 you could do:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/EmpDetails">
<xsl:copy>
<xsl:for-each-group select="Records" group-adjacent="(position() - 1) idiv 200">
<Split>
<xsl:copy-of select="current-group()"/>
</Split>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
use this code:
<xsl:for-each select="Records[position() mod 2 = 0]">
instead of this
<xsl:for-each select="Records[position() mod 2 = 1]">

Compare two xml tree nodes and find if a node with a value exists in another using xslt

I have an XML input which is merged format of two xmls:
<DATA>
<RECORDS1>
<RECORD>
<id>11</id>
<value>123</value>
</RECORD>
<RECORD>
<id>33</id>
<value>321</value>
</RECORD>
<RECORD>
<id>55</id>
<value>121113</value>
</RECORD>
...
</RECORDS1>
<RECORDS2>
<RECORD>
<id>11</id>
<value>123</value>
</RECORD>
<RECORD>
<id>33</id>
<value>323</value>
</RECORD>
<RECORD>
<id>44</id>
<value>12333</value>
</RECORD>
...
</RECORDS2>
I need to copy in the output the records in RECORDS1 provided:
The records in RECORDS1 doesnot exist in RECORDS2
The records in RECORDS1 exists in RECORDS2 but the value is different
Plus if the output could be extended such with an extra field with value as NEW (when does not not exist) as CHANGE (when exists but value is different)
Output
<DATA>
<RECORDS>
<RECORD>
<id>33</id>
<value>321</value>
<kind>Change</kind>
</RECORD>
<RECORD>
<id>55</id>
<value>121113</value>
<kind>New</kind>
</RECORD>
...
</RECORDS>
I have applied FOR Loop but as the variable in xslt cant be reset hence it doesnot work.
Any ideas?
Perhaps
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
version="2.0">
<xsl:output indent="yes"/>
<xsl:key name="rec2-complete" match="RECORDS2/RECORD" use="concat(id, '|', value)"/>
<xsl:key name="rec2-id" match="RECORDS2/RECORD" use="id"/>
<xsl:template match="DATA">
<xsl:apply-templates select="RECORDS1/RECORD[not(key('rec2-complete', concat(id, '|', value)))]"/>
</xsl:template>
<xsl:template match="RECORDS1/RECORD">
<xsl:copy>
<xsl:copy-of select="node()"/>
<merged>
<xsl:value-of select="if (key('rec2-id', id)/value != value) then 'changed' else 'new'"/>
</merged>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
implements the requirements.
Or, to construct the complete result you have shown now, use
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
version="2.0">
<xsl:output indent="yes"/>
<xsl:key name="rec2-complete" match="RECORDS2/RECORD" use="concat(id, '|', value)"/>
<xsl:key name="rec2-id" match="RECORDS2/RECORD" use="id"/>
<xsl:template match="DATA">
<xsl:copy>
<RECORDS>
<xsl:apply-templates select="RECORDS1/RECORD[not(key('rec2-complete', concat(id, '|', value)))]"/>
</RECORDS>
</xsl:copy>
</xsl:template>
<xsl:template match="RECORDS1/RECORD">
<xsl:copy>
<xsl:copy-of select="node()"/>
<kind>
<xsl:value-of select="if (key('rec2-id', id)/value != value) then 'change' else 'new'"/>
</kind>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

How to best sort xml records with xslt?

Here's my source xml file, it has records like so:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<A msgVersion="revision2.0" xmlns="abc:def.ghi" >
<B>
<ID>12345</ID>
</B>
<record>
<name>Foo</name>
<recTime>2020-03-30T23:59:36.62Z</recTime>
</record>
<record>
<name>Bar</name>
<recTime>2020-03-31T23:59:36.62Z</recTime>
</record>
<record>
<name>Car</name>
<recTime>2020-03-29T23:59:36.62Z</recTime>
</record>
</A>
I want to transform it so that all the records are sorted by "recTime", like so:
<?xml version="1.0" encoding="UTF-8"?>
<A msgVersion="revision2.0" xmlns="abc:def.ghi">
<B>
<ID>12345</ID>
</B>
<record>
<name>Car</name>
<recTime>2020-03-29T23:59:36.62Z</recTime>
</record>
<record>
<name>Foo</name>
<recTime>2020-03-30T23:59:36.62Z</recTime>
</record>
<record>
<name>Bar</name>
<recTime>2020-03-31T23:59:36.62Z</recTime>
</record>
</A>
I'm playing with xslt, but I'm not familiar with it. Here's what I have so far:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ss="abc:def.ghi">
<xsl:output method="xml" version="1.0" omit-xml-declaration="no" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="newline">
<xsl:text>
</xsl:text>
</xsl:variable>
<xsl:template match="/">
<A>
<xsl:value-of select="$newline"/>
<xsl:copy-of select="/ss:*/ss:B" />
<xsl:for-each select="/ss:*/ss:record">
<xsl:sort select="ss:recTime"/>
<record>
<xsl:value-of select="$newline"/>
<name><xsl:value-of select="ss:name"/></name>
<xsl:value-of select="$newline"/>
<recTime><xsl:value-of select="ss:recTime"/></recTime>
<xsl:value-of select="$newline"/>
</record>
</xsl:for-each>
</A>
</xsl:template>
</xsl:stylesheet>
Here's what it outputs:
<?xml version="1.0" encoding="UTF-8"?>
<A xmlns:ss="abc:def.ghi">
<B xmlns="abc:def.ghi"><ID>12345</ID></B><record>
<name>Car</name>
<recTime>2020-03-29T23:59:36.62Z</recTime>
</record><record>
<name>Foo</name>
<recTime>2020-03-30T23:59:36.62Z</recTime>
</record><record>
<name>Bar</name>
<recTime>2020-03-31T23:59:36.62Z</recTime>
</record></A>
It has a few problems:
The root element 'A' is missing the attribute msgVersion="revision2.0"
Element 'B' should not have a namespace, since it did not in the source xml
The xml formatting (spacing and new lines) isn't pretty, but that's not high priority
I also have a gut feeling I'm going things the hard way. Any help would be appreciated.
How about:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ss="abc:def.ghi">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/ss:A">
<xsl:copy>
<xsl:apply-templates select="#* | ss:B"/>
<xsl:apply-templates select="ss:record">
<xsl:sort select="ss:recTime"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

Verticalize XML using XSLT

I am trying to implement an at first looking simple transformation but whatever I have tried has been failed.
The XML is generated from a fixed length record and have the below format.
<?xml version="1.0" encoding="UTF-8"?>
<record>
<no_of_records>30</no_of_records>
<cust_lastname_1>Smith</cust_lastname_1>
<cust_name_1>John</cust_name_1>
<cust_id_1>X45</cust_id_1>
<cust_lastname_2>George</cust_lastname_2>
<cust_name_2>Michael</cust_name_2>
<cust_id_2>X76</cust_id_2>
<cust_lastname_3>Ria</cust_lastname_3>
<cust_name_3>Chris</cust_name_3>
<cust_id_3>C87</cust_id_3>
...
</record>
The no_of_records indicates how many _X suffixed elements contains each record and because of its fix length origin has a defined maximum.
I want to transform it to a “verticalized” form resempling the below.
<record>
<customer num="1">
<lastname>Smith</lastname>
<name>John</name>
<id>X45</id>
</customer>
<customer num="2">
<lastname>George</lastname>
<name>Michael</name>
<id>X76</id>
</customer>
<customer num="3">
<lastname>Ria</lastname>
<name>Chris</name>
<id>C87</id>
...
</customer>
</record>
Any help would greatly appreciated.
In XSLT 2.0, you want something like
<xsl:for-each-group select="*" group-starting-with="*[starts-with(local-name(), 'cust_lastname']">
<customer num="{position()}">
<xsl:apply-templates select="current-group()"/>
</customer>
</xsl:for-each-group>
....
<xsl:template match="*[starts-with(local-name(), 'cust')]">
<xsl:element name="{replace(local-name(), 'cust_(.*?)_[0-9]+', '$1')}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
The solution from #Michael Kay works fine. Thank you !
XML
<?xml version="1.0" encoding="UTF-8"?>
<record>
<no_of_records>3</no_of_records>
<cust_lastname_1>Smith</cust_lastname_1>
<cust_name_1>John</cust_name_1>
<cust_id_1>X45</cust_id_1>
<cust_lastname_2>George</cust_lastname_2>
<cust_name_2>Michael</cust_name_2>
<cust_id_2>X76</cust_id_2>
<cust_lastname_3>Ria</cust_lastname_3>
<cust_name_3>Chris</cust_name_3>
<cust_id_3>C87</cust_id_3>
</record>
XSLT
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="record">
<records>
<xsl:for-each-group select="*[starts-with(local-name(), 'cust_')]"
group-starting-with="*[starts-with(local-name(), 'cust_lastname')]">
<customer num="{position()}">
<xsl:apply-templates select="current-group()"/>
</customer>
</xsl:for-each-group>
</records>
</xsl:template>
<xsl:template match="*[starts-with(local-name(), 'cust')]">
<xsl:element name="{replace(local-name(), 'cust_(.*?)_[0-9]+', '$1')}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Result
<?xml version="1.0" encoding="UTF-8"?>
<records>
<customer num="1">
<lastname>Smith</lastname>
<name>John</name>
<id>X45</id>
</customer>
<customer num="2">
<lastname>George</lastname>
<name>Michael</name>
<id>X76</id>
</customer>
<customer num="3">
<lastname>Ria</lastname>
<name>Chris</name>
<id>C87</id>
</customer>
</records>

XSLT To filter out records with letters

We have a requirement to filter records with characters in numeric fields and report them separately. I did come across the following question which has been answered -
XPATH To filter out records with letters
However is there a way to mark these records with a flag or collect them in a variable as we need to report these records as invalid records. If we delete them completely the problem is that we do not have a clue on which of them were invalid.
Please suggest.
Thank You!
Input:
<?xml version="1.0" encoding="UTF-8"?>
<payload>
<records>
<record>
<number>123</number>
</record>
<record>
<number>456</number>
</record>
<record>
<number>78A</number>
</record>
</records>
</payload>
Output:
<?xml version="1.0" encoding="UTF-8"?>
<payload>
<records>
<record>
<number>123</number>
</record>
<record>
<number>456</number>
</record>
</records>
</payload>
XSLT solution from the link above:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="record[translate(number, '0123456789', '')]"/>
</xsl:stylesheet>
After the match, output the original element with whatever "flag" you want (attribute, comment, processing instruction, etc.).
Example:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="record[string(number(number))='NaN']">
<record invalid="true">
<xsl:apply-templates select="#*|node()"/>
</record>
</xsl:template>
</xsl:stylesheet>
Output
<payload>
<records>
<record>
<number>123</number>
</record>
<record>
<number>456</number>
</record>
<record invalid="true">
<number>78A</number>
</record>
</records>
</payload>
You can still use your original match if you'd like.
Edit to handle multiple number (fields) and identify the specific fields (columns) at the record level.
Modified XML input example:
<payload>
<records>
<record>
<number>123</number>
</record>
<record>
<number>456</number>
</record>
<record>
<number>321</number>
<number>78A</number>
<number>654</number>
<number>abc</number>
</record>
</records>
</payload>
Updated XSLT 1.0
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="record">
<xsl:variable name="invalidCols">
<xsl:apply-templates select="*" mode="invalid"/>
</xsl:variable>
<record>
<xsl:if test="string($invalidCols)">
<xsl:attribute name="invalidCols">
<xsl:value-of select="normalize-space($invalidCols)"/>
</xsl:attribute>
</xsl:if>
<xsl:apply-templates select="#*|node()"/>
</record>
</xsl:template>
<xsl:template match="number[string(number(.))='NaN']" mode="invalid">
<xsl:number/>
<xsl:text> </xsl:text>
</xsl:template>
<xsl:template match="*" mode="invalid"/>
</xsl:stylesheet>
Output
<payload>
<records>
<record>
<number>123</number>
</record>
<record>
<number>456</number>
</record>
<record invalidCols="2 4">
<number>321</number>
<number>78A</number>
<number>654</number>
<number>abc</number>
<number>123456</number>
</record>
</records>
</payload>