How to Efficiently compare 2 large volume XML files - xslt

-- EDIT -- , clarifying documents & desired output. (also why variance between 1st reponse)
I'm trying to compare 2 large XML data sets using XSLT 2.0 (I can also use 3.0) and I'm having some performance issues.
I have ~300k records in file 1 that I need to compare against another ~300k records in file 2 to see if entries from file 1 exists in file 2. If so, I need to insert a node to the result. I also need to exclude certain record types from file 1.
File 1
<?xml version="1.0" encoding="UTF-8"?>
<root>
<row>
<col1>100035</col1>
<col2>3000009091</col2>
<col3>SSL</col3>
<col4>8.000000</col4>
<col5>06-Jul-2020</col5>
<col6>A</col6>
</row>
<row>
<col1>100002</col1>
<col2>3000009091</col2>
<col3>UUT</col3>
<col4>8.000000</col4>
<col5>07-Jul-2020</col5>
<col6>P</col6>
</row>
<row>
<col1>100028</col1>
<col2>3000009091</col2>
<col3>UUT</col3>
<col4>8.000000</col4>
<col5>08-Jul-2020</col5>
<col6>P</col6>
</row>
<row>
<col1>100200</col1>
<col2>3000009091</col2>
<col3>UUT</col3>
<col4>8.000000</col4>
<col5>09-Jul-2020</col5>
<col6>A</col6>
</row>
<row>
<col1>100689</col1>
<col2>3000009091</col2>
<col3>UUT</col3>
<col4>8.000000</col4>
<col5>10-Jul-2020</col5>
<col6>A</col6>
</row>
<row>
<col1>100035</col1>
<col2>3000013528</col2>
<col3>UFH</col3>
<col4>8.000000</col4>
<col5>16-Jul-2020</col5>
<col6>A</col6>
</row>
</root>
File 2
<?xml version="1.0" encoding="UTF-8"?>
<nm:Data xmlns:nm="namespace">
<nm:Entry>
<nm:Record>
<nm:ID>10084722-Jun-2020UUT</nm:ID>
</nm:Record>
<nm:Record>
<nm:ID>48548310-Jul-2020SSL</nm:ID>
</nm:Record>
<nm:Record>
<nm:ID>10000201-Jul-2020UUT</nm:ID>
</nm:Record>
<nm:Record>
<nm:ID>57307407-Jul-2020SSL</nm:ID>
</nm:Record>
<nm:Record>
<nm:ID>10003516-Jul-2020UFH</nm:ID>
</nm:Record>
<nm:Record>
<nm:ID>10020009-Jul-2020UUT</nm:ID>
</nm:Record>
<nm:Record>
<nm:ID>00155501-Jun-2020UUT</nm:ID>
</nm:Record>
<nm:Record>
<nm:ID>10533728-May-2020UUT</nm:ID>
</nm:Record>
</nm:Entry>
<nm:Entry>
<nm:Record>
<nm:ID>99954801-Jul-2020UUT</nm:ID>
</nm:Record>
</nm:Entry>
<nm:Entry>
<nm:Record>
<nm:ID>30254801-Jun-2020UFH</nm:ID>
</nm:Record>
</nm:Entry>
</nm:Data>
The Desired Output (copy 'A' records and add "type" node). "Adj" if there is matching ID from File 2 otherwise, "New" type:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<row>
<type>New</type>
<col1>100035</col1>
<col2>3000009091</col2>
<col3>SSL</col3>
<col4>8.000000</col4>
<col5>06-Jul-2020</col5>
<col6>A</col6>
</row>
<row>
<type>Adj</type>
<col1>100200</col1>
<col2>3000009091</col2>
<col3>UUT</col3>
<col4>8.000000</col4>
<col5>09-Jul-2020</col5>
<col6>A</col6>
</row>
<row>
<type>New</type>
<col1>100689</col1>
<col2>3000009091</col2>
<col3>UUT</col3>
<col4>8.000000</col4>
<col5>10-Jul-2020</col5>
<col6>A</col6>
</row>
<row>
<type>Adj</type>
<col1>100035</col1>
<col2>3000013528</col2>
<col3>UFH</col3>
<col4>8.000000</col4>
<col5>16-Jul-2020</col5>
<col6>A</col6>
</row>
</root>
Originally, I couldn't get the exact output so I compromised with the following xslt; however, performance is poor and I need a much more efficient solution.
XSLT Attempt 1 (want to replace exists() & copy-of() functions):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:nm="namespace"
exclude-result-prefixes="xs" version="3.0">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="report" select="document('File2.xml')"/>
<xsl:template match="root">
<root>
<xsl:for-each select="row[col6 = 'A']">
<record>
<!-- Create value to match against -->
<xsl:variable name="inputID" select="concat(col1,col5,col3)"/>
<!-- Add Node based on existing match or not -->
<xsl:choose>
<xsl:when test="exists($report/nm:Data/nm:Entry/nm:Record/nm:ID[. = $inputID])">
<type>Adj</type>
</xsl:when>
<xsl:otherwise>
<type>New</type>
</xsl:otherwise>
</xsl:choose>
<!-- Copy all other nodes -->
<xsl:copy-of select="."/>
</record>
</xsl:for-each>
</root>
</xsl:template>
</xsl:stylesheet>
Actual Output 1 (not perfect output, but acceptable):
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:nm="namespace">
<record>
<type>New</type>
<row>
<col1>100035</col1>
<col2>3000009091</col2>
<col3>SSL</col3>
<col4>8.000000</col4>
<col5>06-Jul-2020</col5>
<col6>A</col6>
</row>
</record>
<record>
<type>Adj</type>
<row>
<col1>100200</col1>
<col2>3000009091</col2>
<col3>UUT</col3>
<col4>8.000000</col4>
<col5>09-Jul-2020</col5>
<col6>A</col6>
</row>
</record>
<record>
<type>New</type>
<row>
<col1>100689</col1>
<col2>3000009091</col2>
<col3>UUT</col3>
<col4>8.000000</col4>
<col5>10-Jul-2020</col5>
<col6>A</col6>
</row>
</record>
<record>
<type>Adj</type>
<row>
<col1>100035</col1>
<col2>3000013528</col2>
<col3>UFH</col3>
<col4>8.000000</col4>
<col5>16-Jul-2020</col5>
<col6>A</col6>
</row>
</record>
</root>
I then took the suggestions below and tried applying both streaming & the key() function in XSLT 3.0 but I've been unable to get anything functioning. The closest was this xslt here, but the output is incorrect.
XSLT 3.0 attempt:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:nm="namespace"
exclude-result-prefixes="#all" version="3.0">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="report" select="document('File2.xml')"/>
<xsl:key name="ref" match="nm:Data/nm:Entry/nm:Record/nm:ID" use="."/>
<xsl:key name="type-ref" match="row" use="col6"/>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="key('type-ref', 'A')[key('ref', col1 || col3 || col5, $report)]">
<xsl:copy>
<type>Adj</type>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="key('type-ref', 'A')[not(key('ref', col1 || col3 || col5, $report))]">
<xsl:copy>
<type>New</type>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="key('type-ref', 'P')"/>
</xsl:stylesheet>
3.0 Output (note that the "Adj" type is not being applied correctly but P records are being dropped):
<?xml version="1.0" encoding="UTF-8"?>
<root>
<row>
<type>New</type>
<col1>100035</col1>
<col2>3000009091</col2>
<col3>SSL</col3>
<col4>8.000000</col4>
<col5>06-Jul-2020</col5>
<col6>A</col6>
</row>
<row>
<type>New</type>
<col1>100200</col1>
<col2>3000009091</col2>
<col3>UUT</col3>
<col4>8.000000</col4>
<col5>09-Jul-2020</col5>
<col6>A</col6>
</row>
<row>
<type>New</type>
<col1>100689</col1>
<col2>3000009091</col2>
<col3>UUT</col3>
<col4>8.000000</col4>
<col5>10-Jul-2020</col5>
<col6>A</col6>
</row>
<row>
<type>New</type>
<col1>100035</col1>
<col2>3000013528</col2>
<col3>UFH</col3>
<col4>8.000000</col4>
<col5>16-Jul-2020</col5>
<col6>A</col6>
</row>
</root>
I don't quite have a deep enough understanding of the key() function to adjust to tweak it further or how to correctly apply the copy() statements when trying to use the stream mode.
Thank you again for the input & I'll keep trying.

I would use a key (https://www.w3.org/TR/xslt-30/#key) to index the second document and (perhaps additionally) a key to select only certain rows for the whole processing:
<xsl:key name="ref" match="data/id" use="."/>
<xsl:key name="type-ref" match="row" use="type"/>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="root">
<xsl:copy>
<xsl:apply-templates select="key('type-ref', 'A')"/>
</xsl:copy>
</xsl:template>
<xsl:template match="row[key('ref', id || code || date, $report)]">
<xsl:copy>
<type>Adj</type>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="row[not(key('ref', id || code || date, $report))]">
<xsl:copy>
<type>New</type>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
https://xsltfiddle.liberty-development.net/a9HjZH/2
The arguments to the key function are explained in https://www.w3.org/TR/xslt-30/#func-key:
fn:key( $key-name as xs:string,
$key-value as xs:anyAtomicType*,
$top as node()) as node()*
The third argument is used to identify the selected subtree. If the
argument is present, the selected subtree is the set of nodes that
have $top as an ancestor-or-self node. If the argument is omitted, the
selected subtree is the document containing the context node. This
means that the third argument effectively defaults to /.
Applied to your altered input samples (only difficulty was to concat the colX elements in the order their values appear in the second document) that would give
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:nm="namespace"
exclude-result-prefixes="#all"
version="3.0">
<xsl:param name="report">
<nm:Data xmlns:nm="namespace">
<nm:Entry>
<nm:Record>
<nm:ID>10084722-Jun-2020UUT</nm:ID>
</nm:Record>
<nm:Record>
<nm:ID>48548310-Jul-2020SSL</nm:ID>
</nm:Record>
<nm:Record>
<nm:ID>10000201-Jul-2020UUT</nm:ID>
</nm:Record>
<nm:Record>
<nm:ID>57307407-Jul-2020SSL</nm:ID>
</nm:Record>
<nm:Record>
<nm:ID>10003516-Jul-2020UFH</nm:ID>
</nm:Record>
<nm:Record>
<nm:ID>10020009-Jul-2020UUT</nm:ID>
</nm:Record>
<nm:Record>
<nm:ID>00155501-Jun-2020UUT</nm:ID>
</nm:Record>
<nm:Record>
<nm:ID>10533728-May-2020UUT</nm:ID>
</nm:Record>
</nm:Entry>
<nm:Entry>
<nm:Record>
<nm:ID>99954801-Jul-2020UUT</nm:ID>
</nm:Record>
</nm:Entry>
<nm:Entry>
<nm:Record>
<nm:ID>30254801-Jun-2020UFH</nm:ID>
</nm:Record>
</nm:Entry>
</nm:Data>
</xsl:param>
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="ref" match="nm:Data/nm:Entry/nm:Record/nm:ID" use="."/>
<xsl:key name="type-ref" match="row" use="col6"/>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="root">
<xsl:copy>
<xsl:apply-templates select="key('type-ref', 'A')"/>
</xsl:copy>
</xsl:template>
<xsl:template match="row[key('ref', col1 || col5 || col3, $report)]">
<xsl:copy>
<type>Adj</type>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="row[not(key('ref', col1 || col5 || col3, $report))]">
<xsl:copy>
<type>New</type>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
https://xsltfiddle.liberty-development.net/a9HjZH/3
Finally, with XSLT 3 and streaming (e.g. with Saxon 9 or 10 EE) you could use a different approach that reads the second document with streaming into a map and then streams through the first input document and performs the template matching on each row materialized in memory:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:map="http://www.w3.org/2005/xpath-functions/map"
exclude-result-prefixes="#all"
version="3.0">
<xsl:param name="doc2-uri" as="xs:string">input-sample2.xml</xsl:param>
<xsl:strip-space elements="*"/>
<xsl:output indent="yes"/>
<xsl:param name="key-map" as="map(xs:string, xs:boolean)">
<xsl:map>
<xsl:source-document href="{$doc2-uri}" streamable="yes">
<xsl:iterate select="data/id">
<xsl:map-entry key="string()" select="true()"/>
</xsl:iterate>
</xsl:source-document>
</xsl:map>
</xsl:param>
<xsl:mode on-no-match="shallow-copy" streamable="yes"/>
<xsl:template match="root">
<xsl:copy>
<xsl:apply-templates select="row!copy-of()" mode="grounded"/>
</xsl:copy>
</xsl:template>
<xsl:mode name="grounded" on-no-match="shallow-copy"/>
<xsl:template match="row[map:contains($key-map, id || code || date)]" mode="grounded">
<xsl:copy>
<type>Adj</type>
<xsl:apply-templates mode="#current"/>
</xsl:copy>
</xsl:template>
<xsl:template match="row[not(map:contains($key-map, id || code || date))]" mode="grounded">
<xsl:copy>
<type>New</type>
<xsl:apply-templates mode="#current"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
or, for the adapted input samples and the clarified requirement that only certain types of rows are to be processed:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:map="http://www.w3.org/2005/xpath-functions/map"
xmlns:nm="namespace"
exclude-result-prefixes="#all"
version="3.0">
<xsl:param name="doc2-uri" as="xs:string">input2-sample2.xml</xsl:param>
<xsl:strip-space elements="*"/>
<xsl:output indent="yes"/>
<xsl:param name="key-map" as="map(xs:string, xs:boolean)">
<xsl:map>
<xsl:source-document href="{$doc2-uri}" streamable="yes">
<xsl:iterate select="nm:Data/nm:Entry/nm:Record/nm:ID">
<xsl:map-entry key="string()" select="true()"/>
</xsl:iterate>
</xsl:source-document>
</xsl:map>
</xsl:param>
<xsl:mode on-no-match="shallow-copy" streamable="yes"/>
<xsl:template match="root">
<xsl:copy>
<xsl:apply-templates select="row!copy-of()[col6 = 'A']" mode="grounded"/>
</xsl:copy>
</xsl:template>
<xsl:mode name="grounded" on-no-match="shallow-copy"/>
<xsl:template match="row[map:contains($key-map, col1 || col5 || col3)]" mode="grounded">
<xsl:copy>
<type>Adj</type>
<xsl:apply-templates mode="#current"/>
</xsl:copy>
</xsl:template>
<xsl:template match="row[not(map:contains($key-map, col1 || col5 || col3))]" mode="grounded">
<xsl:copy>
<type>New</type>
<xsl:apply-templates mode="#current"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
That should keep the memory consumption for the the first document low, even if you have millions of rows. For the second document it streams through and build a light-weight map to store the keys instead of holding the complete XML tree and its key function in memory.

Related

Grouping 3 elements into a node

I have the following xml where a the xml element name have a number in it..
<Root>
<Row>
<Coverage>Partial</Coverage>
<Admission>Self</Admission>
<Sequence1>1</Sequence1>
<Qualifier1>221</Qualifier1>
<Date1>2017-06-01</Date1>
<Sequence2>2</Sequence2>
<Qualifier2>222</Qualifier2>
<Date2>2022-05-06</Date2>
</Row>
<Row>
<Coverage>Partial</Coverage>
<Admission>Self</Admission>
<Sequence1>1</Sequence1>
<Qualifier1>321</Qualifier1>
<Date1>2017-06-01</Date1>
<Sequence2>2</Sequence2>
<Qualifier2>322</Qualifier2>
<Date2>2022-05-06</Date2>
</Row>
<Row>
<Coverage>Full</Coverage>
<Admission>Self</Admission>
<Sequence1>1</Sequence1>
<Qualifier1>421</Qualifier1>
<Date1>2017-06-01</Date1>
<Sequence2>2</Sequence2>
<Qualifier2>422</Qualifier2>
<Date2>2022-05-06</Date2>
</Row>
</Root>
I would like to group Sequence, Qualifier and Date into a group node called Benefit like below. Also, Based on the first element "Coverage" value, should be merged. XML output should as below.
<Root>
<Row>
<Coverage>Partial</Coverage>
<Admission>Self</Admission>
<Benefits>
<Benefit>
<Sequence>1</Sequence>
<Qualifier>221</Qualifier>
<Date>2017-06-01</Date>
</Benefit>
<Benefit>
<Sequence>2</Sequence>
<Qualifier>222<Qualifier>
<Date>2022-05-06</Date>
<Benefit>
<Benefit>
<Sequence>3</Sequence>
<Qualifier>321</Qualifier>
<Date>2017-06-01</Date>
</Benefit>
<Benefit>
<Sequence>4</Sequence>
<Qualifier>322<Qualifier>
<Date>2022-05-06</Date>
<Benefit>
</Benefits>
</Row>
<Row>
<Coverage>Full</Coverage>
<Admission>Self</Admission>
<Benefits>
<Benefit>
<Sequence>1</Sequence>
<Qualifier>421</Qualifier>
<Date>2017-06-01</Date>
</Benefit>
<Benefit>
<Sequence>2</Sequence>
<Qualifier>422<Qualifier>
<Date>2022-05-06</Date>
<Benefit>
</Benefits>
</Row>
</Root>
Any help is greatly appreciated.
Here's a simple method, based on the assumption that the 3 elements to be grouped will always come in a contiguous block that starts with a SequenceX:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/Row">
<xsl:copy>
<xsl:copy-of select="Coverage | Admission"/>
<Benefits>
<xsl:for-each select="*[starts-with(name(), 'Sequence')]">
<Benefit>
<xsl:for-each select=". | following-sibling::*[position() < 3]">
<xsl:element name="{translate(name(), '0123456789', '')}">
<xsl:value-of select="." />
</xsl:element>
</xsl:for-each>
</Benefit>
</xsl:for-each>
</Benefits>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
If the above assumption is not true, you can perform actual grouping based on the number included in the element names:
XSLT 2.0
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/Row">
<xsl:copy>
<xsl:copy-of select="Coverage | Admission"/>
<Benefits>
<xsl:for-each-group select="*[matches(name(), '^Sequence|^Qualifier|^Date')]" group-by="replace(name(), '\D', '')">
<Benefit>
<xsl:for-each select="current-group()">
<xsl:element name="{replace(name(), '\d', '')}">
<xsl:value-of select="." />
</xsl:element>
</xsl:for-each>
</Benefit>
</xsl:for-each-group>
</Benefits>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

Group XML elements with comma seperated values with XSLT program

We are new to xslt programming, can you please help us with xslt program.
We need to group xml elements based on "id" tag and concatenate the other xml tag with comma.
input xml file:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<row>
<id>123</id>
<functional_manager__c.users>1234567</functional_manager__c.users>
</row>
<row>
<id>123</id>
<functional_manager__c.users>1200000</functional_manager__c.users>
</row>
<row>
<id>111</id>
<functional_manager__c.users>11111111</functional_manager__c.users>
</row>
<row>
<id>111</id>
<functional_manager__c.users>2222222</functional_manager__c.users>
</row>
<row>
<id>123</id>
<editor__v.users>1234567</editor__v.users>
</row>
<row>
<id>123</id>
<editor__v.users>1200000</editor__v.users>
</row>
<row>
<id>111</id>
<learning_partner__c.users>11111111</learning_partner__c.users>
</row>
<row>
<id>111</id>
<learning_partner__c.users>2222222</learning_partner__c.users>
</row>
</root>
Required Output:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<row>
<id>123</id>
<functional_manager__c.users>1234567,1200000</functional_manager__c.users>
</row>
<row>
<id>111</id>
<functional_manager__c.users>11111111,2222222</functional_manager__c.users>
</row>
<row>
<id>123</id>
<editor__v.users>1234567,1200000</editor__v.users>
</row>
<row>
<id>111</id>
<learning_partner__c.users>11111111,2222222</learning_partner__c.users>
</row>
</root>
code we tried:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" exclude-result-prefixes="xsl wd xsd this env"
xmlns:wd="urn:com.workday/bsvc"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:this="urn:this-stylesheet">
<xsl:output indent="yes" method="xml"/>
<xsl:template match="/">
<Sharingsettings>
<xsl:for-each-group select="/root/row" group-by="id">
<row>
<ID>
<xsl:value-of select="id"/>
</ID>
<functional_manager__c.users>
<xsl:value-of select="//current-group()//functional_manager__c.users">
</xsl:value-of>
</functional_manager__c.users>
</row>
</xsl:for-each-group>
</Sharingsettings>
</xsl:template>
</xsl:stylesheet>
we are trying with XSLT program but it is not giving required output properly.
Thank you so much in advance
With XSLT 3 you can use
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
expand-text="yes">
<xsl:output method="xml" indent="yes"/>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="root">
<xsl:copy>
<xsl:for-each-group select="row" group-adjacent="id">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
<xsl:template match="row/*[not(self::id)]">
<xsl:copy>
<xsl:value-of select="current-group()/*[node-name() = node-name(current())]" separator=","/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

Add attributes from to node to parent

After reading a lot about this question already, I still do not find final solution for my problem as I am an absolut beginner with xsl.
I want to add all attributes of child nodes to parent level.
This is what I have:
<rankings date="2021-03-15">
<ranking rank="1" rank_change="0" points="12008">
<player initials="" nationality="SRB" last_name="Djokovic" first_name="Novak" id="7" display_name="Novak Djokovic"/>
</ranking>
<ranking rank="2" rank_change="1" points="9940">
<player initials="" nationality="RUS" last_name="Medvedev" first_name="Daniil" id="35844" display_name="Daniil Medvedev"/>
</ranking>
<ranking rank="3" rank_change="-1" points="9670">
<player initials="" nationality="ESP" last_name="Nadal" first_name="Rafael" id="4" display_name="Rafael Nadal"/>
</ranking>
</rankings>
This is what I tried (miss identity tranform I think)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="rankings">
<data>
<xsl:apply-templates select="*"/>
</data>
</xsl:template>
<xsl:template match="ranking | player">
<row>
<xsl:apply-templates select="#* | node()"/>
</row>
</xsl:template>
<xsl:template match="ranking/#* | player/#*">
<xsl:element name="{name(.)}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
With following result:
<data>
<row>
<rank>1</rank>
<rank_change>0</rank_change>
<points>12008</points>
<row>
<initials/>
<nationality>SRB</nationality>
<last_name>Djokovic</last_name>
<first_name>Novak</first_name>
<id>7</id>
<display_name>Novak Djokovic</display_name>
</row>
</row>
</data>
This is my goal:
<data>
<row>
<rank>1</rank>
<rank_change>0</rank_change>
<points>12008</points>
<initials/>
<nationality>SRB</nationality>
<last_name>Djokovic</last_name>
<first_name>Novak</first_name>
<id>7</id>
<display_name>Novak Djokovic</display_name>
</row>
</data>
I hope one of you can help me with this.
Cheers,
Phil
try splitting ranking and player in its own template
<xsl:template match="ranking">
<row>
<xsl:apply-templates select="#* | node()"/>
</row>
</xsl:template>
<xsl:template match="player">
<xsl:apply-templates select="#* | node()"/>
</xsl:template>
Result:
<data>
<row>
<rank>1</rank>
<rank_change>0</rank_change>
<points>12008</points>
<initials/>
<nationality>SRB</nationality>
<last_name>Djokovic</last_name>
<first_name>Novak</first_name>
<id>7</id>
<display_name>Novak Djokovic</display_name>
</row>
<row>
<rank>2</rank>
<rank_change>1</rank_change>
<points>9940</points>
<initials/>
<nationality>RUS</nationality>
<last_name>Medvedev</last_name>
<first_name>Daniil</first_name>
<id>35844</id>
<display_name>Daniil Medvedev</display_name>
</row>
<row>
<rank>3</rank>
<rank_change>-1</rank_change>
<points>9670</points>
<initials/>
<nationality>ESP</nationality>
<last_name>Nadal</last_name>
<first_name>Rafael</first_name>
<id>4</id>
<display_name>Rafael Nadal</display_name>
</row>
</data>
If I am guessing correctly what your real goal is, you could do simply:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="rankings">
<data>
<xsl:for-each select="ranking">
<row>
<xsl:for-each select=".//#*">
<xsl:element name="{name(.)}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:for-each>
</row>
</xsl:for-each>
</data>
</xsl:template>
</xsl:stylesheet>

Grouping and moving remaining nodes using XSLT 1.0

I have the following xml,
<?xml version="1.0" encoding="UTF-8"?>
<response>
<case>
<CMEDIA>Phone</CMEDIA>
</case>
<results>
<row>
<IKEY>TestKey1</IKEY>
<OBJECTID>TestObject1</OBJECTID>
</row>
<row>
<IKEY>TestKey1</IKEY>
<OBJECTID>TestObject2</OBJECTID>
</row>
<row>
<IKEY>TestKey1</IKEY>
<OBJECTID>TestObject3</OBJECTID>
</row>
<row>
<IKEY>TestKey4</IKEY>
<OBJECTID>TestObject4</OBJECTID>
</row>
</results>
</response>
My requirement is to group all the matching <IKEY> rows and move them under one <row> and moving all <OBJECTID> nodes under that new <row>.
<?xml version="1.0" encoding="UTF-8"?>
<response>
<case>
<CMEDIA>Phone</CMEDIA>
</case>
<results>
<row>
<IKEY>TestKey1</IKEY>
<OBJECTID>TestObject1</OBJECTID>
<OBJECTID>TestObject2</OBJECTID>
<OBJECTID>TestObject3</OBJECTID>
</row>
<row>
<IKEY>TestKey4</IKEY>
<OBJECTID>TestObject4</OBJECTID>
</row>
</results>
</response>
I am trying with the following xsl for grouping based on <IKEY>, but I am not able to move all <OBJECTID> nodes to new <row>(Here I have to use only XSLT 1.0).
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes" />
<xsl:key name="ikey" match="row" use="string(IKEY)" />
<xsl:template match="results">
<xsl:copy>
<xsl:apply-templates select="row[generate-id() = generate-id(key('ikey', string(IKEY))[1])]" mode="ikey" />
</xsl:copy>
</xsl:template>
<xsl:template match="row" mode="ikey">
<xsl:choose>
<xsl:when test="IKEY">
<row>
<xsl:apply-templates select="IKEY|OBJECTID" />
</row>
</xsl:when>
</xsl:choose>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Can somebody tell me what I am missing here?
Change
<xsl:apply-templates select="IKEY|OBJECTID" />
to
<xsl:apply-templates select="IKEY|key('ikey', IKEY)/OBJECTID" />

How create hierarchy using xslt

I'm new to XSL and am seeking a way to solve some problem. I have xml something like:
<Table>
<Row Id="1">
<Field1>"P_907"</Field1>
<Field2>"5912"</Field2>
<Field3>"2013/05/31"</Field3>
<Field4>"2013/05/31"</Field4>
</Row>
<Row Id="2">
<Field1>"2.1.1.M5"</Field1>
</Row>
<Row Id="3">
<Field1>"3.1.1.M5"</Field1>
</Row>
<Row Id="4">
<Field1>"P_908"</Field1>
<Field2>"5913"</Field2>
<Field3>"2013/05/31"</Field3>
<Field4>"2013/05/31"</Field4>
</Row>
<Row Id="5">
<Field1>"3.11.M2"</Field1>
</Row>
</Table>
Where Row Id=1 and Row Id=4 are headers of invoices and remaining rows are lines of invoices. Every invoice header has its ID in field1 but there is no invoice ID in invoice lines. I know that when there is no field3 in row, it means that row is invoice line. In other case it is invoice header. Every rows before header row belong to previous header row. How create xml with proper invoice hierarchy using xslt?
Output xml could be like:
<Invoice>
<Field1>"P_907"</Field1>
<Field2>"5912"</Field2>
<Field3>"2013/05/31"</Field3>
<Field4>"2013/05/31"</Field4>
<Row>
<Field1>"2.1.1.M5"</Field1>
</Row>
<Row>
<Field1>"3.1.1.M5"</Field1>
</Row>
</Invoice>
<Invoice>
<Field1>"P_908"</Field1>
<Field2>"5913"</Field2>
<Field3>"2013/05/31"</Field3>
<Field4>"2013/05/31"</Field4>
<Row>
<Field1>"3.11.M2"</Field1>
</Row>
</Invoice>
I would do this using keys as following:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<xsl:key name="Rows" match="Row[not(Field3)]" use="generate-id(preceding-sibling::Row[Field3][1])"/>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Row[Field3]">
<Invoice>
<xsl:apply-templates select="node()"/>
<xsl:apply-templates select="key('Rows', generate-id())" mode="followingRows"/>
</Invoice>
</xsl:template>
<xsl:template match="Row" mode="followingRows">
<xsl:copy><xsl:apply-templates select="node()"/></xsl:copy>
</xsl:template>
<xsl:template match="Row"/>
</xsl:stylesheet>
One solution is the following XSLT:
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<xsl:template match="Table">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="Row[Field3]">
<xsl:variable name="invoice-count" select="count(preceding-sibling::Row[Field3]) + 1"/>
<Invoice>
<xsl:apply-templates/>
<xsl:apply-templates select="following-sibling::Row[not(Field3)
and not(count(preceding-sibling::Row[Field3]) > $invoice-count)]" mode="copy"/>
</Invoice>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Row" mode="copy">
<xsl:copy>
<xsl:apply-templates select="*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Row"/>
</xsl:transform>
when applied to your input XML produces the output
<Invoice>
<Field1>"P_907"</Field1>
<Field2>"5912"</Field2>
<Field3>"2013/05/31"</Field3>
<Field4>"2013/05/31"</Field4>
<Row>
<Field1>"2.1.1.M5"</Field1>
</Row>
<Row>
<Field1>"3.1.1.M5"</Field1>
</Row>
</Invoice>
<Invoice>
<Field1>"P_908"</Field1>
<Field2>"5913"</Field2>
<Field3>"2013/05/31"</Field3>
<Field4>"2013/05/31"</Field4>
<Row>
<Field1>"3.11.M2"</Field1>
</Row>
</Invoice>
One template matches all Row elements that contain a Field3:
<xsl:template match="Row[Field3]">
This template writes an <Invoice> node and copies the content of this Row by applying templates. Then all following silbing Row elements that have no Field3 and not more preceding sibling Row elements with Field3 than the current Row are copied by applying the template mode="copy".
This template copies the content of the Row but not the attributes, so the id of the Row will be removed from the output.
To avoid writing the Row elements twice, the empty template
<xsl:template match="Row"/> matches all Row nodes that are already handled by applying templates in the template that matches the Row elements with Field3.