XSLT: merging files but with better performance of the process - xslt

I have two XML files and desire a merger, the criterion for this merger is as follows:
nodes1.xml file content:
<nodes>
<node>
<type>a</type>
<name>joe</name>
</node>
<node>
<type>b</type>
<name>sam</name>
</node>
<node>
<type>c</type>
<name>pez</name>
</node>
<node>
<type>g</type>
<name>lua</name>
</node>
<node>
<type>a</type>
<name>tol</name>
</node>
<node>
<type>c</type>
<name>jua</name>
</node>
</nodes>
nodes2.xml file content:
<nodes>
<node>
<type>a</type>
<name>jill</name>
</node>
<node>
<type>c</type>
<name>imol</name>
</node>
<node>
<type>h</type>
<name>teli</name>
</node>
<node>
<type>f</type>
<name>jopp</name>
</node>
<node>
<type>c</type>
<name>zolh</name>
</node>
</nodes>
and by my xsl template I get:
<?xml version="1.0" encoding="UTF-8"?>
<nodes>
<node tipo="a">
<name>joe</name>
<name>tol</name>
<name>jill</name>
</node>
<node tipo="c">
<name>pez</name>
<name>jua</name>
<name>imol</name>
<name>zolh</name>
</node>
<node tipo="h">
<name>teli</name>
</node>
<node tipo="f">
<name>jopp</name>
</node>
</nodes>
I need a solution to get better performance.
My current solution is:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:variable name="Source2" select="document('nodes2.xml')/nodes/node"/>
<xsl:variable name="Source1" select="document('nodes1.xml')/nodes/node"/>
<xsl:template match="/nodes" >
<nodes>
<xsl:for-each-group select="node" group-by="type">
<node tipo="{type}">
<xsl:apply-templates select="$Source1[type=current-grouping-key()]/name"/>
<xsl:apply-templates select="$Source2[type=current-grouping-key()]/name"/>
</node>
</xsl:for-each-group>
</nodes>
</xsl:template>
<xsl:template match="name">
<name><xsl:value-of select="."/></name>
</xsl:template>
</xsl:stylesheet>
I run it with java saxon:
$ java net.sf.saxon.Transform nodes2.xml mysolution.xsl
I think "a shame" to have the input file at the same time in a variable, but I can not figure out to do it differently.
I appreciate help or pointer.
--Paulino

Assuming you have the second of the files as the primary input to the XSLT code you can use the following:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:param name="source1-uri" select="'nodes1.xml'"/>
<xsl:variable name="doc1" select="doc($source1-uri)"/>
<xsl:key name="by-type" match="nodes/node" use="type"/>
<xsl:template match="/nodes" >
<nodes>
<xsl:for-each-group select="key('by-type', node/type, $doc1), node" group-by="type">
<node tipo="{current-grouping-key()}">
<xsl:copy-of select="for $n in current-group() return $n/name"/>
</node>
</xsl:for-each-group>
</nodes>
</xsl:template>
</xsl:stylesheet>
I am not sure whether the order of the merged name elements matters to you but to ensure with Saxon 9.5 that I get the order you posted in your result sample I had to use <xsl:copy-of select="for $n in current-group() return $n/name"/> instead of the shorter and more usual <xsl:copy-of select="current-group()/name"/>.
So that solution should be more efficient, mainly by grouping on all input nodes and of course by then simply making use of current-group() instead of select the nodes again with a predicate.

Related

Getting Unique values and adding the values in XSLT

Hi I am pretty new to XSLT so need some help on simple XSL code.
My input XML
<?xml version="1.0" encoding="ASCII"?>
<Node Name="Person" Received="1" Good="1" Bad="0" Condition="byPerson:1111">
</Node>
<Node Name="Person" Received="1" Good="1" Bad="0" Condition="byPerson:1111">
</Node>
<Node Name="Person" Received="1" Good="1" Bad="0" Condition="byPerson:2222">
</Node>
<Node Name="Person" Received="1" Good="1" Bad="0" Condition="byPerson:2222">
</Node>
<Node Name="Person" Received="1" Good="1" Bad="0" Condition="byPerson:3333">
</Node>
And i am expecting the result as sum of all Received , good and Bad but that need to added only once per unique condition.
Something like this
<?xml version="1.0" encoding="ASCII"?>
<Received>3</Received >
<Good>3</Good>
<Bad>0</Bad>
i was trying below code but no success so far just getting sum of everything, would like to get sum on only each 'Condition' only once.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<xsl:value-of select= "sum(Node#Received)"/>
<xsl:value-of select= "sum(Node/#Good)"/>
<xsl:value-of select= "sum(Node/#Bad)"/>
</xsl:template>
The following stylesheet uses an xsl:key to group the <node> elements by the value of the #Condition. Using the Meunchien method with key() and generate-id(), to select the first node element for each unique #Condition and then generate the sum() of the attributes of the selected node elements.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output indent="yes"/>
<xsl:key name="nodesByCondition" match="Node" use="#Condition"/>
<xsl:template match="/">
<results>
<xsl:variable name="distinctNodes"
select="*/Node[generate-id() =
generate-id(key('nodesByCondition', #Condition)[1])]"/>
<Received>
<xsl:value-of select= "sum($distinctNodes/#Received)"/>
</Received>
<Good><xsl:value-of select= "sum($distinctNodes/#Good)"/></Good>
<Bad><xsl:value-of select= "sum($distinctNodes/#Bad)"/></Bad>
</results>
</xsl:template>
</xsl:stylesheet>
in XSLT 2.0 you can use distinct-values()

How do I pair appropriate xml elements with xmlstarlet?

I have two sets of XML nodes, and I want to find elements that have identical "phone" child. For example:
<set1>
<node>
<phone>111</phone>
<name>John</name>
</node>
<node>
<phone>444</phone>
<name>Amy</name>
</node>
<node>
<phone>777</phone>
<name>Robin</name>
</node>
</set1>
<set2>
<node>
<phone>111</phone>
<city>Moscow</city>
</node>
<node>
<phone>444</phone>
<city>Prag</city>
</node>
<node>
<phone>999</phone>
<city>Rome</city>
</node>
</set2>
Now I want to get the following:
<result>
<node>
<phone>111</phone>
<name>John</name>
<city>Moscow</city>
</node>
<node>
<phone>444</phone>
<name>Amy</name>
<city>Prag</city>
</node>
<node>
<phone>777</phone>
<name>Robin</name>
</node>
<node>
<phone>999</phone>
<city>Rome</city>
</node>
</result>
I'm a beginner in xslt, and i managed to merge two xml's and put them in a html table. But this pairing is one level over me.
Use a key
<xsl:key name="phone" match="node" use="phone"/>
then group with Muenchian grouping as follows:
<xsl:template match="/">
<result>
<xsl:apply-templates select="//node[generate-id() = generate-id(key('phone', phone)[1])]"/>
</result>
</xsl:template>
<xsl:template match="node">
<xsl:copy>
<xsl:copy-of select="phone"/>
<xsl:copy-of select="key('phone', phone)/*[not(self::phone)]"/>
</xsl:copy>
</xsl:template>
For readability add
<xsl:output indent="yes"/>

xslt 1.0, select group of nodes with key

I want to select nodes based on some variables.
The XML code:
<data>
<prot seq="AAA">
<node num="1">1345</node>
<node num="1">11245</node>
<node num="2">88885</node>
</prot>
<prot seq="BBB">
<node num="1">678</node>
<node num="1">456</node>
<node num="2">6666</node>
</prot>
<prot seq="CCC">
<node num="1">111</node>
<node num="1">222</node>
<node num="2">333</node>
</prot>
</data>
The XML that I want
<output>
<prot seq="AAA">
<node num="1">1345</node>
<node num="2">88885</node>
</prot>
<prot seq="BBB">
<node num="1">678</node>
<node num="2">6666</node>
</prot>
<prot seq="CCC">
<node num="1">111</node>
<node num="2">333</node>
</prot>
</data>
So, my idea has been to group the nodes with a xsl:key element, and then do a for-each of them. For example:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
<xsl:key name="by" match="/data/prot" use="concat(#seq,'|',node/#num)"/>
<xsl:template match="/">
<root>
<xsl:apply-templates select="/data/prot"/>
</root>
</xsl:template>
<xsl:template match="/data/prot">
<xsl:for-each select="./node">
<xsl:for-each select="key('by',concat(current()/../#seq,'|',current()/#num))">
node <xsl:value-of select="./node" />
</xsl:for-each>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
but the output is not what I expected, and I cannot see what I am doing wrong. I would prefer to keep the for-each structure. It is just as if I was not using properly the xsl:key grouping features.
the output that I get, unwanted
<root>
node 1345
node 1345
node 678
node 678
node 111
node 111</root>
And the code as it to be tested
http://www.xsltcake.com/slices/sgWUFu/20
Thanks!
The main problem in your code is that the key indexes prot elements, but what we want to de-duplicate (and need to index) is the node elements.
Here is a short and correct solution:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="nodeByParentAndNum" match="node"
use="concat(generate-id(..), '+', #num)"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/*">
<data>
<xsl:apply-templates/>
</data>
</xsl:template>
<xsl:template match=
"node
[not(generate-id()
=
generate-id(key('nodeByParentAndNum',
concat(generate-id(..), '+', #num)
)
[1]
)
)
]
"/>
</xsl:stylesheet>
when this transformation is applied on the provided XML document:
<data>
<prot seq="AAA">
<node num="1">1345</node>
<node num="1">11245</node>
<node num="2">88885</node>
</prot>
<prot seq="BBB">
<node num="1">678</node>
<node num="1">456</node>
<node num="2">6666</node>
</prot>
<prot seq="CCC">
<node num="1">111</node>
<node num="1">222</node>
<node num="2">333</node>
</prot>
</data>
the wanted, correct result is produced:
<data>
<prot seq="AAA">
<node num="1">1345</node>
<node num="2">88885</node>
</prot>
<prot seq="BBB">
<node num="1">678</node>
<node num="2">6666</node>
</prot>
<prot seq="CCC">
<node num="1">111</node>
<node num="2">333</node>
</prot>
</data>

Check if string appears within node value in XSLT

I have the following XML:
<nodes>
<node>
<articles>125,1,9027</articles>
</node>
<node>
<articles>999,48,123</articles>
</node>
<node>
<articles>123,1234,4345,567</articles>
</node>
</nodes>
I need to write some XSLT which will return only nodes which have a paricular article id, so in the example above, only those nodes which contain article 123.
My XSLT isn't great, so I'm struggling with this. I'd like to do something like this, but I know of course there isn't an 'instring' extension method in XSLT:
<xsl:variable name="currentNodeId" select="1234"/>
<xsl:for-each select="$allNodes [instring(articles,$currentNodeId)]">
<!-- Output stuff -->
</xsl:for-each>
I know this is hacky but not sure of the best approach to tackle this. The node-set is likely to be huge, and the number of article ids inside the nodes is likely to be huge too, so I'm pretty sure turning that splitting the value of the node and turning it into a node-set isn't going to be very efficient, but I could be wrong!
Any help as to the best way to do this would be much appreciated, thanks.
XSLT 2.0 : This will match articles which have exactly 123 somewhere as text.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="id" select="123"/>
<xsl:template match="/">
<xsl:for-each select="//node[matches(articles, concat('(^|\D)', $id, '($|\D)'))]">
<xsl:value-of select="current()"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Sample input :
<?xml version="1.0" encoding="utf-8"?>
<nodes>
<node>
<articles>1234,1000,9027</articles>
</node>
<node>
<articles>999,48,01234</articles>
</node>
<node>
<articles>123,1234,4345,567</articles>
</node>
<node>
<articles> 123 , 456 </articles>
</node>
</nodes>
Output :
123,1234,4345,567
123 , 456
I don't know how to do this efficiently with XSLT 1.0 but as the OP said he is using XSLT 2.0 so this should be a sufficient answer.
In XSLT 1.0 you can use this simple solution, it uses normalize-space, translate, contains, substring, string-length functions.
Sample input XML:
<nodes>
<node>
<articles>125,1,9027</articles>
</node>
<node>
<articles>999,48,123</articles>
</node>
<node>
<articles>123,1234,4345,567</articles>
</node>
<node>
<articles> 123 , 456 </articles>
</node>
<node>
<articles>789, 456</articles>
</node>
<node>
<articles> 123 </articles>
</node>
<node>
<articles>456, 123 ,789</articles>
</node>
</nodes>
XSLT:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="id" select="123"/>
<xsl:template match="node">
<xsl:variable name="s" select="translate(normalize-space(articles/.), ' ', '')"/>
<xsl:if test="$s = $id
or contains($s, concat($id, ','))
or substring($s, string-length($s) - string-length($id) + 1, string-length($id)) = $id">
<xsl:copy-of select="."/>
</xsl:if>
</xsl:template>
<xsl:template match="/nodes">
<xsl:copy>
<xsl:apply-templates select="node"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Output:
<nodes>
<node>
<articles>999,48,123</articles>
</node>
<node>
<articles>123,1234,4345,567</articles>
</node>
<node>
<articles> 123 , 456 </articles>
</node>
<node>
<articles> 123 </articles>
</node>
<node>
<articles>456, 123 ,789</articles>
</node>
</nodes>

How to remove duplicates based on level in hierarchy?

I have the following XML structure:
<node name="A">
<node name="B">
<node name="C"/>
<node name="D"/>
<node name="E"/>
</node>
<node name="D"/>
<node name="E"/>
</node>
I need to get all the leaf nodes. I use //node[not(node)] to get those. Now I need to remove duplicates by leaving elements that are deeper in hierarchy. How do I do that?
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vallLeaves" select="//node()[not(node())]"/>
<xsl:template match="/">
$vallLeaves:
<xsl:copy-of select="$vallLeaves"/>
$vallDistinctLeaves:
<xsl:for-each select="$vallLeaves">
<xsl:if test=
"generate-id()
=
generate-id($vallLeaves[#name
=
current()/#name
]
[1]
)
">
<xsl:copy-of select="."/>
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
when applied on the provided XML document:
<node name="A">
<node name="B">
<node name="C"/>
<node name="D"/>
<node name="E"/>
</node>
<node name="D"/>
<node name="E"/>
</node>
produces the wanted, correct result:
$vallLeaves:
<node name="C"/>
<node name="D"/>
<node name="E"/>
<node name="D"/>
<node name="E"/>
$vallDistinctLeaves:
<node name="C"/>
<node name="D"/>
<node name="E"/>
II. XSLT 2.0 Solution:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vallLeaves" select="//node()[not(node())]"/>
<xsl:variable name="vallDistinctLeaves" as="element()*">
<xsl:for-each-group select="$vallLeaves" group-by="#name">
<xsl:sequence select="."/>
</xsl:for-each-group>
</xsl:variable>
<xsl:template match="/">
$vallLeaves:
<xsl:sequence select="$vallLeaves"/>
$vallDistinctLeaves:
<xsl:sequence select="$vallDistinctLeaves"/>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the same XML document (above), the same correct results are produced:
$vallLeaves:
<node name="C"/>
<node name="D"/>
<node name="E"/>
<node name="D"/>
<node name="E"/>
$vallDistinctLeaves:
<node name="C"/>
<node name="D"/>
<node name="E"/>