Creating taxonomic structure from individual records - xslt

I am having a failure of imagination in how to effectively solve this problem. My actual data set has thousands of thousands of records. Each record indicates its location in a taxonomic structure. I need to create that taxonomic structure and place the records within that structure (e.g., records that indicate they go in "/a/b/c" end up in the "/a/b/c" but there is only one each of the taxonomic levels "a", "b", and "c"). Due to client confidentiality, I've posted a naive representation of this here.
Using XSLT 3 is desirable. I know there is a solution to this using xsl:iterate but I cannot figure it out.
Input:
<outer>
<record>
<id>rec1</id>
<taxNodes>
<node>
<id>1</id>
<note>First level</note>
<node>
<id>node2a</id>
<note>Second level Entry A</note>
</node>
</node>
</taxNodes>
</record>
<record>
<id>rec3</id>
<taxNodes>
<node>
<id>1</id>
<note>First level</note>
<node>
<id>node2b</id>
<note>Second level Entry B</note>
</node>
</node>
</taxNodes>
</record>
<record>
<id>rec4</id>
<taxNodes>
<node>
<id>1</id>
<note>First level</note>
<node>
<id>node2b</id>
<note>Second level Entry B</note>
</node>
</node>
</taxNodes>
</record>
</outer>
Desired Output:
<outer>
<node>
<id>1</id>
<note>First level</note>
<node>
<id>node2a</id>
<note>Second level Entry A</note>
<records>
<record>
<id>rec1</id>
</record>
</records>
</node>
<node>
<id>node2b</id>
<note>Second level Entry B</note>
<records>
<record>
<id>rec3</id>
</record>
<record>
<id>rec4</id>
</record>
</records>
</node>
</node>
</outer>

I think this can be seen as a grouping problem and then solved using a recursive function using xsl:for-each-group:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math"
xmlns:mf="http://example.com/mf"
exclude-result-prefixes="xs math mf"
version="3.0">
<xsl:output indent="yes"/>
<xsl:function name="mf:group" as="node()*">
<xsl:param name="input-nodes" as="element(node)*"/>
<xsl:for-each-group select="$input-nodes" group-by="id">
<xsl:copy>
<xsl:copy-of select="id, note"/>
<xsl:choose>
<xsl:when test="current-group()/node">
<xsl:sequence select="mf:group(current-group()/node)"/>
</xsl:when>
<xsl:otherwise>
<records>
<xsl:apply-templates select="current-group()/ancestor::record"/>
</records>
</xsl:otherwise>
</xsl:choose>
</xsl:copy>
</xsl:for-each-group>
</xsl:function>
<xsl:template match="outer">
<xsl:copy>
<xsl:sequence select="mf:group(record/taxNodes/node)"/>
</xsl:copy>
</xsl:template>
<xsl:template match="record">
<xsl:copy>
<xsl:copy-of select="id"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
That gives the desired result I think for the input sample you have posted, I am not sure it will do for other inputs, mainly as I am not sure how variable the input can be, I think if I understand the spec 'there is only one each of the taxonomic levels "a", "b", and "c"' correctly then it should work fine.
As for having a huge input file and using XSLT 3.0 (with streaming?), I am not sure that a streaming solution is possible, due to the nature of the problem where we need to recursively group the whole set of input nodes.

This stylesheet provides a solution but does so using a function for looping.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:local="http://www.local.com"
exclude-result-prefixes="xs local"
version="2.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="maxTaxonomyDepth" select="max(//node[not(node)]/count(ancestor-or-self::node))" as="xs:integer"/>
<xsl:sequence select="local:getTaxNodesForDepth(outer, 1, $maxTaxonomyDepth, '')" />
</xsl:template>
<xsl:function name="local:getTaxNodesForDepth">
<xsl:param name="out" as="element()" />
<xsl:param name="curDepth" as="xs:integer" />
<xsl:param name="maxDepth" as="xs:integer" />
<xsl:param name="parentId" as="xs:string*" />
<xsl:for-each select="distinct-values($out/record/taxNodes//node[count(ancestor-or-self::node) = $curDepth]
[if ($curDepth > 1) then parent::node/id/normalize-space(.) = $parentId else true()]
/id/normalize-space(.))">
<xsl:variable name="context" select="." as="xs:string" />
<node>
<xsl:sequence select="($out/record/taxNodes//node[count(ancestor-or-self::node) = $curDepth][id/normalize-space(.) = $context])[1]/(id | descriptor)" />
<xsl:apply-templates select="$out/record[taxNodes/descendant::node[last()][id/normalize-space(.) = $context]]" />
<xsl:choose>
<xsl:when test="$curDepth < $maxDepth">
<xsl:sequence select="local:getTaxNodesForDepth($out, $curDepth + 1, $maxDepth,
($out/record/taxNodes//node[count(ancestor-or-self::node) = $curDepth][id/normalize-space(.) = $context])[1]/id/normalize-space(.))" />
</xsl:when>
</xsl:choose>
</node>
</xsl:for-each>
</xsl:function>
<xsl:template match="taxNodes"/>
<xsl:template match="#*|node()" mode="#default">
<xsl:copy>
<xsl:apply-templates select="#*" mode="#current"/>
<xsl:apply-templates select="node()" mode="#current"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

Related

How to partition dates with XSLT

I have a group of dates and I'd like to create partitions with a criterion such as "exactly 7 days apart" For example this is my source xml:
<root>
<entry date="2019-05-12" />
<entry date="2019-05-19" />
<entry date="2019-05-26" />
<entry date="2019-06-16" />
<entry date="2019-06-23" />
</root>
The result should be like this:
<root>
<group>
<val>12.5.</val>
<val>19.5.</val>
<val>26.5.</val>
</group>
<group>
<val>16.6.</val>
<val>23.6.</val>
</group>
</root>
since the first three and the last two dates are all on a Sunday without a gap.
What I have so far is this:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:sd="urn:someprefix"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
>
<xsl:output indent="yes"/>
<xsl:template match="root">
<root>
<xsl:copy-of select="sd:partition(distinct-values(for $i in entry/#date return $i cast as xs:date))"/>
</root>
</xsl:template>
<xsl:function name="sd:partition">
<xsl:param name="dates" as="xs:date*"/>
<xsl:for-each-group select="$dates" group-adjacent="format-date(., '[F]')">
<group>
<xsl:for-each select="current-group()">
<val>
<xsl:value-of select="format-date(.,'[D].[M].')"/>
</val>
</xsl:for-each>
</group>
</xsl:for-each-group>
</xsl:function>
</xsl:stylesheet>
Which only generates one group.
How can I ask for the previous element to be 7 days apart? I know of duration (xs:dayTimeDuration('P1D')), but I don't know how to compare it to a previous value.
I use Saxon 9.8 HE.
I think you can also do it using group-adjacent:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
expand-text="yes"
version="3.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="root">
<xsl:copy>
<xsl:for-each-group select="entry/#date/xs:date(.)"
group-adjacent=". - (position() - 1) * xs:dayTimeDuration('P7D')">
<group>
<xsl:apply-templates select="current-group()"/>
</group>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
<xsl:template match=".[. instance of xs:date]">
<val>{format-date(.,'[D].[M].')}</val>
</xsl:template>
</xsl:stylesheet>
https://xsltfiddle.liberty-development.net/ncdD7mM
To do your grouping, you really need to know the difference in days with the previous element, then you can group starting with dates where the difference is not 7 days. So, you can declare a variable where you build up some new XML with the dates and differences, and then use that to group.
Try this function in your XSLT instead.
<xsl:function name="sd:partition">
<xsl:param name="dates" as="xs:date*"/>
<xsl:variable name="datesWithDiff" as="element()*">
<xsl:for-each select="$dates">
<xsl:variable name="pos" select="position()" />
<date diff="{(. - $dates[$pos - 1]) div xs:dayTimeDuration('P1D')}">
<xsl:value-of select="." />
</date>
</xsl:for-each>
</xsl:variable>
<xsl:for-each-group select="$datesWithDiff" group-starting-with="date[#diff = '' or xs:int(#diff) gt 7]">
<group>
<xsl:for-each select="current-group()">
<val>
<xsl:value-of select="format-date(.,'[D].[M].')"/>
</val>
</xsl:for-each>
</group>
</xsl:for-each-group>
</xsl:function>

XSLT 1.0 transformation

I'm trying to transform a XML file with XSLT 1.0 but I'm having troubles with this.
Input:
<task_order>
<Q>
<record id="1">
<column name="task_externalId">SPLIT4_0</column>
</record>
<record id="2">
<column name="task_externalId">SPLIT4_1</column>
</record>
</Q>
<task>
<id>SPLIT4</id>
<name>test</name>
</task>
</task_order>
Wanted result:
For each task_order element: When there is more than 1 record-element (SPLIT4 and SPLIT4_1) I need to duplicate the task element and change the orginal task-id with the id from record elements.
<task_order>
<Q>
<record id="1">
<column name="task_externalId">SPLIT4_0</column>
</record>
<record id="2">
<column name="task_externalId">SPLIT4_1</column>
</record>
</Q>
<task>
<id>SPLIT4_0</id>
<name>test</name>
</task>
<task>
<id>SPLIT4_1</id>
<name>test</name>
</task>
</task_order>
Any suggestions?
Thank you
First start off with the Identity Template which will handle copying across all existing nodes
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
Next, to make looking up the columns slightly easier, consider using an xsl:key
<xsl:key name="column" match="column" use="substring-before(., '_')" />
Then, you have a template matching task where you can look up all matching column elements using the key, and create a new task element for each
<xsl:template match="task">
<xsl:variable name="task" select="." />
<xsl:for-each select="key('column', id)">
<!-- Create new task -->
</xsl:for-each>
</xsl:template>
Try this XSTL
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes" />
<xsl:key name="column" match="column" use="substring-before(., '_')" />
<xsl:template match="task">
<xsl:variable name="task" select="." />
<xsl:for-each select="key('column', id)">
<task>
<id><xsl:value-of select="." /></id>
<xsl:apply-templates select="$task/*[not(self::id)]" />
</task>
</xsl:for-each>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

Get distinct values from xml

My sample xml looks below: I need to get the distinct states from xml. I am using xslt 1.0 in vs 2010 editor.
<?xml version="1.0" encoding="utf-8" ?>
<states>
<node>
<value>2</value>
<state>DE</state>
</node>
<node>
<value>1</value>
<state>DE</state>
</node>
<node>
<value>1</value>
<state>NJ</state>
</node>
<node>
<value>1</value>
<state>NY</state>
</node>
<node>
<value>1</value>
<state>NY</state>
</node>
</states>
My xslt looks like below:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl"
xmlns:user="urn:my-scripts">
<xsl:output method="text" indent="yes"/>
<xsl:key name="st" match="//states/node/state" use="." />
<xsl:variable name="disst">
<xsl:for-each select="//states/node[contains(value,1)]/state[generate-id()=generate-id(key('st',.)[1])]" >
<xsl:choose>
<xsl:when test="(position() != 1)">
<xsl:value-of select="concat(', ',.)" disable-output-escaping="yes"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="." disable-output-escaping="yes"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:variable>
<xsl:template match="/" >
<xsl:value-of disable-output-escaping="yes" select="$disst"/>
</xsl:template>
</xsl:stylesheet>
Output: DE,NJ,NY
My above xml looks good for the above test xml.
If I change the xml as below:
<?xml version="1.0" encoding="utf-8" ?>
<states>
<node>
<value>2</value>
<state>DE</state>
</node>
<node>
<value>1</value>
<state>DE</state>
</node>
<node>
<value>1</value>
<state>NJ</state>
</node>
<node>
<value>1</value>
<state>NY</state>
</node>
<node>
<value>1</value>
<state>NY</state>
</node>
</states>
It in not picking the state DE. Can any one suggest the suitable solution.Thanks in advance.
I need to find out the distinct states from the xml.
The problem here is your use of a predicate in your Muenchian grouping XPath:
[contains(value,1)]
This will often make Muenchian grouping fail to find all of the available distinct values. Instead, you should add the predicate to the key:
<xsl:key name="st" match="//states/node[contains(value, 1)]/state" use="." />
Alternatively, you can apply the predicate inside the grouping statement:
<xsl:apply-templates
select="//states/node
/state[generate-id() =
generate-id(key('st',.)[contains(../value, 1)][1])]" />
Full XSLT (with some improvements):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:user="urn:my-scripts">
<xsl:output method="text" indent="yes"/>
<xsl:key name="st" match="//states/node/state" use="." />
<xsl:variable name="a" select="1" />
<xsl:variable name="disst">
<xsl:apply-templates
select="//states/node
/state[generate-id() =
generate-id(key('st',.)[contains(../value, $a)][1])]" />
</xsl:variable>
<xsl:template match="state">
<xsl:if test="position() > 1">
<xsl:text>,</xsl:text>
</xsl:if>
<xsl:value-of select ="." disable-output-escaping="yes" />
</xsl:template>
<xsl:template match="/" >
<xsl:value-of disable-output-escaping="yes" select="$disst"/>
</xsl:template>
</xsl:stylesheet>
Result when run on your sample XML:
DE,NJ,NY

XSLT multiple stylesheets

I have the following xml
<TopLevel>
<data m="R263">
<s ut="263firstrecord" lt="2013-02-16T09:21:40.393" />
<s ut="263secondrecord" lt="2013-02-16T09:21:40.393" />
</data>
<data m="R262">
<s ut="262firstrecord" lt="2013-02-16T09:21:40.393" />
<s ut="262secondrecord" lt="2013-02-16T09:21:40.393" />
</data>
</TopLevel>
I have some XSLT that does the call template but it's not itterating correctly.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="data">
<xsl:value-of select="#m" />
<xsl:variable name="vYourName" select="#m"/>
<xsl:choose>
<xsl:when test="#m='R262'">
<xsl:call-template name="R262"/>
</xsl:when>
</xsl:choose>
<xsl:choose>
<xsl:when test="#m='R263'">
<xsl:call-template name="R263"/>
</xsl:when>
</xsl:choose>
</xsl:template>
<xsl:template name="R262">
<xsl:for-each select="/TopLevel/data/s">
Column1=<xsl:value-of select="#ut" />
Column2=<xsl:value-of select="#lt" />
</xsl:for-each>
</xsl:template>
<xsl:template name="R263">
<xsl:for-each select="/TopLevel/data/s">
Column1=<xsl:value-of select="#ut" />
Column2=<xsl:value-of select="#lt" />
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
This gives me 8 records insead of the 4 (<s> level) records. I know it has to do with my iteration ... but I am not sure how to address this.
I am also aware of the apply stylesheets but I couldn't unravel that mystery either ... If someone can help me with XSLT that will only process everything from <TopLevel> to <\TopLevel> checking the value of m at the <data> level and applying the stylesheet at the <s> level for each <s> record I will be greateful beyond belief.
I don't know what output you want to produce, but I suspect you want to replace
<xsl:for-each select="/TopLevel/data/s">
by
<xsl:for-each select="s">
that is, you only want to process the "s" elements within the "data" you are currently processing, rather than selecting all the "s" elements in the whole document.
Why not do this using apply-templates?
<xsl:template match="data">
...
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="s[../#m='R262']">
...
</xsl:template>
<xsl:template match="s[../#m='R263']">
...
</xsl:template>
If you want to use match template and apply-templates you could do the following which gives you also a text output just like your stylesheet does. So this XSLT applied to your original source XML:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="data">
<xsl:value-of select="#m"/>
<xsl:apply-templates select="s"/>
</xsl:template>
<xsl:template match="s">
Column1=<xsl:value-of select="#ut"/>
Column2=<xsl:value-of select="#lt"/>
</xsl:template>
</xsl:stylesheet>
gives you this output:
<?xml version="1.0" encoding="UTF-8"?>
R263
Column1=263firstrecord
Column2=2013-02-16T09:21:40.393
Column1=263secondrecord
Column2=2013-02-16T09:21:40.393
R262
Column1=262firstrecord
Column2=2013-02-16T09:21:40.393
Column1=262secondrecord
Column2=2013-02-16T09:21:40.393
You basically only match on the s and give out the attributes "ut" and "lt". You can also output XML which would look better.
Using this XSLT:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<root>
<xsl:apply-templates/>
</root>
</xsl:template>
<xsl:template match="data">
<list>
<xsl:apply-templates select="s"/>
</list>
</xsl:template>
<xsl:template match="s">
<xsl:element name="record">
<xsl:attribute name="m">
<xsl:value-of select="parent::data/#m"/>
</xsl:attribute>
<item>Column1=<xsl:value-of select="#ut"/></item>
<item>Column2=<xsl:value-of select="#lt"/></item>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
will give you this nice XML output:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<list>
<record m="R263">
<item>Column1=263firstrecord</item>
<item>Column2=2013-02-16T09:21:40.393</item>
</record>
<record m="R263">
<item>Column1=263secondrecord</item>
<item>Column2=2013-02-16T09:21:40.393</item>
</record>
</list>
<list>
<record m="R262">
<item>Column1=262firstrecord</item>
<item>Column2=2013-02-16T09:21:40.393</item>
</record>
<record m="R262">
<item>Column1=262secondrecord</item>
<item>Column2=2013-02-16T09:21:40.393</item>
</record>
</list>
You have to adapt the original XSLT a little bit to get a nice XML structure. Also when matching s you "climb" up to element data to get the R-numbers for your attribute values.
The template matching root you need for a proper XML root element. <list> you could also get rid off then you have <record> as child of <root>.

How do I combine or merge grouped nodes?

Using the XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0">
<xsl:output method="xml"/>
<xsl:template match="/">
<records>
<record>
<!-- Group record by bigID, for further processing -->
<xsl:for-each-group select="records/record" group-by="bigID">
<xsl:sort select="bigID"/>
<xsl:for-each select="current-group()">
<!-- Create new combined record -->
<bigID>
<!-- <xsl:value-of select="."/> -->
<xsl:for-each select=".">
<xsl:value-of select="bigID"/>
</xsl:for-each>
</bigID>
<text>
<xsl:value-of select="text"/>
</text>
</xsl:for-each>
</xsl:for-each-group>
</record>
</records>
</xsl:template>
</xsl:stylesheet>
I'm trying to change:
<?xml version="1.0" encoding="UTF-8"?>
<records>
<record>
<bigID>123</bigID>
<text>Contains text for 123</text>
<bigID>456</bigID>
<text>Some 456 text</text>
<bigID>123</bigID>
<text>More 123 text</text>
<bigID>123</bigID>
<text>Yet more 123 text</text>
</record>
</records>
into:
<?xml version="1.0" encoding="UTF-8"?>
<records>
<record>
<bigID>123
<text>Contains text for 123</text>
<text>More 123 text</text>
<text>Yet more 123 text</text>
</bigID>
<bigID>456
<text>Some 456 text</text>
</bigID>
</record>
</records>
Right now, I'm just listing the grouped <bigID>s, individually. I'm missing the step after grouping, where I combine the grouped <bigID> nodes. My suspicion is that I need to use the "key" function somehow, but I'm not sure.
Thanks for any help.
Here is the wanted XSLT 2.0 transformation:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:key name="kTextforId" match="text"
use="preceding-sibling::bigID[1]"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="record">
<record>
<xsl:for-each-group select="bigID" group-by=".">
<bigID>
<xsl:sequence select="current-grouping-key()"/>
<xsl:copy-of select=
"key('kTextforId', current-grouping-key())"/>
</bigID>
</xsl:for-each-group>
</record>
</xsl:template>
</xsl:stylesheet>
When performed on the provided XML document, the wanted result is produced.
In XSLT 2.0 you don't need keys for grouping.
Since you are just copying the text elements in the group, the inner for-each can be removed.
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" />
<xsl:template match="/">
<records>
<record>
<xsl:for-each-group select="records/record/bigID" group-by=".">
<xsl:sort select="." data-type="number" />
<bigID>
<xsl:value-of select="." />
<xsl:copy-of select="current-group()/following-sibling::text[1]" />
</bigID>
</xsl:for-each-group>
</record>
</records>
</xsl:template>
</xsl:stylesheet>
If you instead wanted to output the bigID elements followed by their text elements, then my loop would be replaced by the following.
<xsl:for-each-group select="records/record/bigID" group-by=".">
<xsl:sort select="." data-type="number" />
<xsl:copy-of select="." />
<xsl:copy-of select="current-group()/following-sibling::text[1]" />
</xsl:for-each-group>