XSLT: Merge a set of tree hierarchies - xslt

I have an XML document based what Excel produces when saving as "XML Spreadsheet 2003 (*.xml)".
The spreadsheet itself contains a header section with a hierarchy of labels:
| A B C D E F G H I
-+-----------------------------------------------------
1| a1 a2
2| a11 a12 a13 a21 a22
3| a111 a112 a121 a122 a131 a132 a221 a222
This hierarchy is present on all sheets in the workbook, and looks more or less the same everywhere.
Excel XML works exactly like ordinary HTML tables. (<row>s that contain <cell>s). I have been able to transform everything into such a tree structure:
<node title="a1" col="1">
<node title="a11" col="1">
<node title="a111" col="1"/>
<node title="a112" col="2"/>
</node>
<node title="a12" col="3">
<node title="a121" col="3" />
<node title="a122" col="4" />
</node>
<!-- and so on -->
</node>
But here is the complication:
there is more than one worksheet, so there is a tree for each of them
the hierarchy may be slightly different on each sheet, the trees will not be equal (for example, sheet 2 may have "a113", while the others don't)
tree depth is not explicitly limited
the labels however are meant to be the same across all sheets, which means they can be used for grouping
I'd like to merge these separate trees into one that looks like this:
<node title="a1">
<col on="sheet1">1</col>
<col on="sheet2">1</col>
<node title="a11">
<col on="sheet1">1</col>
<col on="sheet2">1</col>
<node title="a111">
<col on="sheet1">1</col>
<col on="sheet2">1</col>
</node>
<node title="a112">
<col on="sheet1">2</col>
<col on="sheet2">2</col>
</node>
<node title="a113"><!-- different here -->
<col on="sheet2">3</col>
</node>
</node>
<node title="a12">
<col on="sheet1">3</col>
<col on="sheet2">4</col>
<node title="a121">
<col on="sheet1">3</col>
<col on="sheet2">4</col>
</node>
<node title="a122">
<col on="sheet1">4</col>
<col on="sheet2">5</col>
</node>
</node>
<!-- and so on -->
</node>
Ideally I'd like to be able to do the merge before I even build the three structure from the Excel XML (if you get me started on this, it'd be great). But since I have no idea how I would do this, a merge after the trees have been built (i.e.: the situation described above) will be fine.
Thanks for your time. :)

Here is one possible solution in XSLT 1.0:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/*">
<t>
<xsl:apply-templates
select="node[#title='a1'][1]">
<xsl:with-param name="pOther"
select="node[#title='a1'][2]"/>
</xsl:apply-templates>
</t>
</xsl:template>
<xsl:template match="node">
<xsl:param name="pOther"/>
<node title="{#title}">
<col on="sheet1">
<xsl:value-of select="#col"/>
</col>
<xsl:choose>
<xsl:when test="not($pOther)">
<xsl:apply-templates mode="copy">
<xsl:with-param name="pSheet" select="'sheet1'"/>
</xsl:apply-templates>
</xsl:when>
<xsl:otherwise>
<col on="sheet2">
<xsl:value-of select="$pOther/#col"/>
</col>
<xsl:for-each select=
"node[#title = $pOther/node/#title]">
<xsl:apply-templates select=".">
<xsl:with-param name="pOther" select=
"$pOther/node[#title = current()/#title]"/>
</xsl:apply-templates>
</xsl:for-each>
<xsl:apply-templates mode="copy" select=
"node[not(#title = $pOther/node/#title)]">
<xsl:with-param name="pSheet" select="'sheet1'"/>
</xsl:apply-templates>
<xsl:apply-templates mode="copy" select=
"$pOther/node[not(#title = current()/node/#title)]">
<xsl:with-param name="pSheet" select="'sheet2'"/>
</xsl:apply-templates>
</xsl:otherwise>
</xsl:choose>
</node>
</xsl:template>
<xsl:template match="node" mode="copy">
<xsl:param name="pSheet"/>
<node title="{#title}">
<col on="{$pSheet}">
<xsl:value-of select="#col"/>
</col>
<xsl:apply-templates select="node" mode="copy">
<xsl:with-param name="pSheet" select="$pSheet"/>
</xsl:apply-templates>
</node>
</xsl:template>
</xsl:stylesheet>
When the above transformation is applied on this XML document (the concatenation of the two XML documents under a common top node -- left as an exercise for the reader :) ):
<t>
<node title="a1" col="1">
<node title="a11" col="1">
<node title="a111" col="1"/>
<node title="a112" col="2"/>
</node>
<node title="a12" col="3">
<node title="a121" col="3" />
<node title="a122" col="4" />
</node>
<!-- and so on -->
</node>
<node title="a1" col="1">
<node title="a11" col="1">
<node title="a111" col="1"/>
<node title="a112" col="2"/>
<node title="a113" col="3"/>
</node>
<node title="a12" col="4">
<node title="a121" col="4" />
<node title="a122" col="5" />
</node>
<!-- and so on -->
</node>
</t>
The wanted result is produced:
<t>
<node title="a1">
<col on="sheet1">1</col>
<col on="sheet2">1</col>
<node title="a11">
<col on="sheet1">1</col>
<col on="sheet2">1</col>
<node title="a111">
<col on="sheet1">1</col>
<col on="sheet2">1</col>
</node>
<node title="a112">
<col on="sheet1">2</col>
<col on="sheet2">2</col>
</node>
<node title="a113">
<col on="sheet2">3</col>
</node>
</node>
<node title="a12">
<col on="sheet1">3</col>
<col on="sheet2">4</col>
<node title="a121">
<col on="sheet1">3</col>
<col on="sheet2">4</col>
</node>
<node title="a122">
<col on="sheet1">4</col>
<col on="sheet2">5</col>
</node>
</node>
</node>
</t>
Do note the following:
We suppose that both top node elements have "a1" as the value of their title attribute. This can easily be generalized.
The template matching node has a parameter named pOther, which is the corresponding element named node from the other document. This template is applied - to only if $pOther exists.
When no corresponding element named node exists, another template, also matching node, but in mode copy is applied. This template has a parameter named pSheet, the value of which is the sheet name (string) this element belongs to.

How about a callable template taking the sheet number as a parameter, which examines the input and returns the correct "col" node if it appears in that sheet's XML, and nothing if it doesn't. At each node, call it once for each sheet.
To merge the trees, maybe a template that looks for all children of the current node in any sheet, and recurses on itself for each of them.
Sorry no sample code, I find writing XSLT to be pretty slow, probably because I don't do it often. So I may well have missed something crucial. But putting it all together would give something like:
get the title of "/node". With that title:
search all sheets for this title, emitting the "col" node for each
search all sheets for children of nodes with this title (discarding duplicates)
recurse on each of those titles.
Here are some snippets for removing duplicates in various ways:
http://www.dpawson.co.uk/xsl/sect2/N2696.html
Reading multiple documents is processor-dependent, but if all else fails a bit of cut-and-pastery with any old scripting language would probably do, provided that you know they'll all have the same encoding, don't use conflicting ids, and so on.

Related

xpath or xslt (1.0) to find max number of rows in a grid with blocks of arbitrary length

Context and ultimate objective
Consider the below XML which should create the grid in the image. Each col element represents a cell (whether empty or containing a region) with a width and length. For a given block, the starting row (latitude) is known, the ending one not.
Note there is no <row latitude="6"/> because that row is already used up as part of the Desert States and Deep South blocks. Similarly, <col timezone="PDT"/> is missing for row 3 because that cell is already taken up by North West.
I need to know how many rows I need to make the final grid. In this example, I would need 10 rows.
Question
My current approach is to work out the timezone that has to highest sum of length.
sum(//col[#timezone='EDT']/#length)
The problem with the above xpath is that the timezone is hardcoded here (and in the real application is actually an axis with a very large set of possible values).
I've tried keys and muenchian grouping but to no avail.
What xpath 1.0 or xslt 1.0 can I use?
XML
<rows>
<row latitude="1">
<cols>
<col timezone="PDT" width="1" length="1">Canada</col>
<col timezone="CDT" width="1" length="1">Canada</col>
<col timezone="EDT" width="1" length="1">Canada</col>
</cols>
</row>
<row latitude="2">
<cols>
<col timezone="PDT" width="1" length="2">North West</col>
<col timezone="CDT" width="1" length="1"></col>
<col timezone="EDT" width="1" length="1"></col>
</cols>
</row>
<row latitude="3">
<cols>
<col timezone="CDT" width="1" length="1"></col>
<col timezone="EDT" width="1" length="2">NY/NJ</col>
</cols>
</row>
<row latitude="4">
<cols>
<col timezone="PDT" width="1" length="3">Desert States</col>
<col timezone="CDT" width="1" length="1"></col>
</cols>
</row>
<row latitude="5">
<cols>
<col timezone="CDT" width="2" length="6">Deep South / Bahamas</col>
<col timezone="EDT" width="2" length="6">Deep South / Bahamas</col>
</cols>
</row>
<row latitude="7">
<cols>
<col timezone="PDT" width="1" length="2">California</col>
</cols>
</row>
</rows>
If (as I think) you want to know the largest sum of length of any timezone, you need to group the col elements by their timezone, sort the groups by their sum and get the sum value of the first (or the last, depending on the sort order) group.
Here's an example:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:key name="col-by-TZ" match="col" use="#timezone" />
<xsl:template match="/rows">
<xsl:variable name="n">
<xsl:for-each select="row/cols/col[count(. | key('col-by-TZ', #timezone)[1]) = 1]">
<xsl:sort select="sum(key('col-by-TZ', #timezone)/#length)" data-type="number" order="descending"/>
<xsl:if test="position()=1">
<xsl:value-of select="sum(key('col-by-TZ', #timezone)/#length)"/>
</xsl:if>
</xsl:for-each>
</xsl:variable>
<output>
<test>
<xsl:value-of select="$n"/>
</test>
</output>
</xsl:template>
</xsl:stylesheet>
Replace the test part with the actual logic you want to apply using the $n variable.

XSLT: How to find source and target Xpath for the edge?

I want to write an xslt file to transfer an xmi file in a graphical file. But I meet the problem that the edge can not connect the right source node and target node. I have tried already two weeks. But I am still confused. Please help me. Thanks a million.
The original code is:
<?xml version="1.0" encoding="UTF-8"?>
<xml xmlns:xmi="#">
<element xmi:id="BasicElement-Line1" name="Line1" xmi:type="association"/>
<element xmi:id="BasicElement-Line2" name="Line2" xmi:type="association"/>
<element xmi:id="BasicElement-Object1" name="Object1" xmi:type="class">
<ownedAttribute xmi:type="Property" name="input" type="BasicElement-Object2" association="BasicElement-Line1"/>
<ownedAttribute xmi:type="Property" name="output" type="BasicElement-Object3" association="BasicElement-Line2"/>
</element>
<element xmi:id="BasicElement-Object2" name="Object2" xmi:type="class">
</element>
<element xmi:id="BasicElement-Object3" name="Object3" xmi:type="class">
</element>
</xml>
and my aim code is:
<?xml version="1.0" encoding="UTF-8"?>
<xmi xmlns:y="##">
<edge target="N1002D" source="N1001B" id="N10005">
<y:PolyLineEdge>
<y:Arrows target="none" source="none" />
</y:PolyLineEdge>
</edge>
<edge target="N1002D" source="N1001B" id="N10010">
<y:PolyLineEdge>
<y:Arrows target="none" source="none" />
</y:PolyLineEdge>
</edge>
<node id="N1001B">
<y:NodeLabel>BasicElement-Object1</y:NodeLabel>
</node>
<node id="N1002D">
<y:NodeLabel>BasicElement-Object2</y:NodeLabel>
</node>
<node id="N10033">
<y:NodeLabel>BasicElement-Object3</y:NodeLabel>
</node>
</xmi>
Because there will be more "class" element in the future. So I used "{generate-id()}" to define the node IDs. But when I do that, I found the edge can not find the way of source node and target node. So I have already worked on it two weeks and have no idea on it. Please help me, I really appreciate.
I'm not really familiar with XMI and the target format, but here's something that should fit your description.
Source:
<?xml version="1.0" encoding="UTF-8"?>
<xml xmlns:xmi="#">
<element xmi:id="BasicElement-Line1" name="Line1" xmi:type="association">
<ownedEnd xmi:type="Property" type="BasicElement-Object1" association="BasicElement-Line1"/>
</element>
<element xmi:id="BasicElement-Line2" name="Line2" xmi:type="association">
<ownedEnd xmi:type="Property" type="BasicElement-Object1" association="BasicElement-Line2"/>
</element>
<element xmi:id="BasicElement-Object1" name="Object1" xmi:type="class">
<ownedAttribute xmi:type="Property" name="input" type="BasicElement-Object2" association="BasicElement-Line1"/>
<ownedAttribute xmi:type="Property" name="output" type="BasicElement-Object3" association="BasicElement-Line2"/>
</element>
<element xmi:id="BasicElement-Object2" name="Object2" xmi:type="class">
</element>
<element xmi:id="BasicElement-Object3" name="Object3" xmi:type="class">
</element>
</xml>
Transformed with (adjust the namespaces to the correct uris):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xmi="#" xmlns:y="##"
exclude-result-prefixes="xmi" version="1.0">
<xsl:output indent="yes"/>
<xsl:template match="xml">
<xmi>
<xsl:apply-templates select="element"/>
</xmi>
</xsl:template>
<xsl:template match="element[#xmi:type='class']">
<node id="{generate-id()}">
<y:NodeLabel>
<xsl:value-of select="#xmi:id"/>
</y:NodeLabel>
<y:UMLClassNode/>
</node>
</xsl:template>
<xsl:template match="element[#xmi:type='association']">
<!-- association name -->
<xsl:variable name="association" select="ownedEnd/#association"/>
<!-- id of source -->
<xsl:variable name="ownedEnd-type" select="ownedEnd/#type"/>
<!-- using association variable to select the correct id of target -->
<xsl:variable name="ownedAttribute-type"
select="//element[#xmi:id = $ownedEnd-type]/ownedAttribute[#association = $association]/#type"/>
<edge id="{ generate-id() }"
source="{ generate-id( /xml/element[#xmi:id = $ownedEnd-type] ) }"
target="{ generate-id( /xml/element[#xmi:id = $ownedAttribute-type] ) }">
<y:PolyLineEdge>
<y:Arrows source="none" target="none"/>
</y:PolyLineEdge>
</edge>
</xsl:template>
</xsl:stylesheet>
gives you:
<xmi xmlns:y="##">
<edge id="d0e3" source="d0e13" target="d0e20">
<y:PolyLineEdge>
<y:Arrows source="none" target="none"/>
</y:PolyLineEdge>
</edge>
<edge id="d0e8" source="d0e13" target="d0e23">
<y:PolyLineEdge>
<y:Arrows source="none" target="none"/>
</y:PolyLineEdge>
</edge>
<node id="d0e13">
<y:NodeLabel>BasicElement-Object1</y:NodeLabel>
<y:UMLClassNode/>
</node>
<node id="d0e20">
<y:NodeLabel>BasicElement-Object2</y:NodeLabel>
<y:UMLClassNode/>
</node>
<node id="d0e23">
<y:NodeLabel>BasicElement-Object3</y:NodeLabel>
<y:UMLClassNode/>
</node>
</xmi>

Why is there no sibling axis?

Looking at the available axes in XSLT I had to find out that there is no sibling axis which would be the union of preceding-sibling and following-sibling. To me this is a little surprising since I already wrote one answer (XSLT issue...CSV issue.?) in which this axis would have been helpful (although I only have about 10 answers so far). Of course, it is obvious that you can always solve the problem by using the union. So this axis is not really required. But it would be very handy every once in a while and like all the other axes IMHO it would make the code more readable and easier to maintain.
Does anybody know why this axis was left out? Is there maybe a non-obvious reason for this?
By the way: I found at least one issue on StackExchange with a warning about a potential performance degrade using the preceding-sibling and following-sibling axes. But I assume this is true for all the axes containing a substantial portion of the XML tree is used in a nested way. So the reason for omission could not have been due to performance.
Since there has been no activity with this question for a while I would like to answer it myself. Picking up one thought in the comments, it is, of course, hard to retrospectively say why the people responsible of the XSLT 1.0 specification omitted the sibling axis.
One of the most conclusive reasons could have been related to the comments by #JLRiche and #MichaelKay: axis are supposed to go into a specific direction with respect to the reference node and it may be difficult to determine what the direction for sibling would be.
In order to investigate this a little further I set up a test XSLT and a test input XML to check how the axes work (see further below) and in particular what the order of the nodes in the axes are. The result was surprising to me:
The preceding-sibling axes does not start at the node closest to the reference node but with node closest to the start of the document.
The following-sibling does start at the reference node.
This would actually allow to define
sibling := preceding-sibling | following-sibling
with the nodes in this set being continuously iterated from the beginning of the document to the end. There would be no "jump".
The suggested alternative
../node except .
also works well and yields the same set in the same ordering. However, looking at an unfamiliar XSLT I would assume that a sibling axis would explain the logic better than using the parent-children construct.
Interestingly, the fact that axes do not start at the node closest to the reference node but at the node closest the beginning of the document also applies to preceding and ancestor so for example ancester::node[1] does not return the parent of the node but the root node.
The original motivation for me to ask the question was related to not having to repeat a lengthy CONDITION imposed on the attributes of the nodes, e.g. I did not want to write
preceding-sibling::node[CONDITION] | following-sibling::node[CONDITION]
However, since the expression above can be rewritten as
(preceding-sibling::node | following-sibling::node)[CONDITION]
the disadvantage of having to use two axes instead of a sibling axis is not as bad as thought. Of course, in XSLT 2.0 this also works for
(../node except .)[CONDITION]
So, to answer my question: I don't think there is a good reason not to define a sibling axis. I guess nobody thought of it. :-)
Test Setup
This XML test input
<?xml version="1.0" encoding="ISO-8859-1"?>
<node id="1">
<node id="2">
<node id="3">
<node id="4"/>
<node id="5"/>
<node id="6"/>
</node>
<node id="7">
<node id="8"/>
<node id="9"/>
<node id="10"/>
</node>
<node id="11">
<node id="12"/>
<node id="13"/>
<node id="14"/>
</node>
</node>
<node id="15">
<node id="16">
<node id="17"/>
<node id="18"/>
<node id="19"/>
</node>
<node id="20">
<node id="21"/>
<node id="22"/>
<node id="23"/>
</node>
<node id="24">
<node id="25"/>
<node id="26"/>
<node id="27"/>
</node>
</node>
<node id="28">
<node id="29">
<node id="30"/>
<node id="31"/>
<node id="32"/>
</node>
<node id="33" value="A">
<node id="34"/>
<node id="35"/>
<node id="36"/>
</node>
<node id="37">
<node id="38"/>
<node id="39"/>
<node id="40"/>
</node>
<node id="41">
<node id="42"/>
<node id="43"/>
<node id="44"/>
</node>
<node id="45" value="A">
<node id="46"/>
<node id="47"/>
<node id="48"/>
</node>
</node>
</node>
using this XSLT 2.0 sheet
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<xsl:variable name="id" select="'37'"/>
<xsl:template name="dump">
<xsl:text> </xsl:text>
<xsl:value-of select="#id"/>
</xsl:template>
<xsl:template match="//node[#id = $id]">
<xsl:text>preceding siblings: </xsl:text>
<xsl:for-each select="preceding-sibling::node">
<xsl:call-template name="dump"/>
</xsl:for-each>
<xsl:text>
following siblings: </xsl:text>
<xsl:for-each select="following-sibling::node">
<xsl:call-template name="dump"/>
</xsl:for-each>
<xsl:text>
preceding and following siblings: </xsl:text>
<xsl:for-each select="preceding-sibling::node | following-sibling::node">
<xsl:call-template name="dump"/>
</xsl:for-each>
<xsl:text>
preceding and following siblings with value A: </xsl:text>
<xsl:for-each select="(preceding-sibling::node | following-sibling::node)[#value = 'A']">
<xsl:call-template name="dump"/>
</xsl:for-each>
<xsl:text>
following siblings: </xsl:text>
<xsl:for-each select="following-sibling::node">
<xsl:call-template name="dump"/>
</xsl:for-each>
<xsl:text>
parent's children: </xsl:text>
<xsl:for-each select="../node">
<xsl:call-template name="dump"/>
</xsl:for-each>
<xsl:text>
parent's children except self: </xsl:text>
<xsl:for-each select="../node except .">
<xsl:call-template name="dump"/>
</xsl:for-each>
<xsl:text>
parent's children except self with value A: </xsl:text>
<xsl:for-each select="(../node except .)[#value = 'A']">
<xsl:call-template name="dump"/>
</xsl:for-each>
<xsl:text>
ancestors: </xsl:text>
<xsl:for-each select="ancestor::node">
<xsl:call-template name="dump"/>
</xsl:for-each>
<xsl:text>
immediate ancestor: </xsl:text>
<xsl:for-each select="(ancestor::node)[1]">
<xsl:call-template name="dump"/>
</xsl:for-each>
<xsl:text>
ancestors or self: </xsl:text>
<xsl:for-each select="ancestor-or-self::node">
<xsl:call-template name="dump"/>
</xsl:for-each>
<xsl:text>
descendants: </xsl:text>
<xsl:for-each select="descendant::node">
<xsl:call-template name="dump"/>
</xsl:for-each>
<xsl:text>
descendants or self: </xsl:text>
<xsl:for-each select="descendant-or-self::node">
<xsl:call-template name="dump"/>
</xsl:for-each>
<xsl:text>
preceding: </xsl:text>
<xsl:for-each select="preceding::node">
<xsl:call-template name="dump"/>
</xsl:for-each>
<xsl:text>
following: </xsl:text>
<xsl:for-each select="following::node">
<xsl:call-template name="dump"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
will yield this output
preceding siblings: 29 33
following siblings: 41 45
preceding and following siblings: 29 33 41 45
preceding and following siblings with value A: 33 45
following siblings: 41 45
parent's children: 29 33 37 41 45
parent's children except self: 29 33 41 45
parent's children except self with value A: 33 45
ancestors: 1 28
immediate ancestor: 1
ancestors or self: 1 28 37
descendants: 38 39 40
descendants or self: 37 38 39 40
preceding: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 29 30 31 32 33 34 35 36
following: 41 42 43 44 45 46 47 48

Finding unique nodes with xslt

I have an xml document that contains some "Item" elements with ids. I want to make a list of the unique Item ids. The Item elements are not in a list though - they can be at any depth within the xml document - for example:
<Node>
<Node>
<Item id="1"/>
<Item id="2"/>
</Node>
<Node>
<Item id="1"/>
<Node>
<Item id="3"/>
</Node>
</Node>
<Item id="2"/>
</Node>
I would like the output 1,2,3 (or a similar representation). If this can be done with a single xpath then even better!
I have seen examples of this for lists of sibling elements, but not for a general xml tree structure. I'm also restricted to using xslt 1.0 methods. Thanks!
Selecting all unique items with a single XPath expression (without indexing, beware of performance issues):
//Item[not(#id = preceding::Item/#id)]
Try this (using Muenchian grouping):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:key name="item-id" match="Item" use="#id" />
<xsl:template match="/Node">
<xsl:for-each select="//Item[count(. | key('item-id', #id)[1]) = 1]">
<xsl:value-of select="#id" />,
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Not sure if this is what you mean, but just in case.
In the html
<xsl:apply-templates select="item"/>
The template.
<xsl:template match="id">
<p>
<xsl:value-of select="#id"/> -
<xsl:value-of select="."/>
</p>
</xsl:template>

How can I build a tree from a flat XML list using XSLT?

i use a minimalist MVC framework, where the PHP controler hands the DOM model to the XSLT view (c.f. okapi).
in order to build a navigation tree, i used nested sets in MYSQL. this way, i end up with a model XML that looks as follows:
<tree>
<node>
<name>root</name>
<depth>0</depth>
</node>
<node>
<name>TELEVISIONS</name>
<depth>1</depth>
</node>
<node>
<name>TUBE</name>
<depth>2</depth>
</node>
<node>
<name>LCD</name>
<depth>2</depth>
</node>
<node>
<name>PLASMA</name>
<depth>2</depth>
</node>
<node>
<name>PORTABLE ELECTRONICS</name>
<depth>1</depth>
</node>
<node>
<name>MP3 PLAYERS</name>
<depth>2</depth>
</node>
<node>
<name>FLASH</name>
<depth>3</depth>
</node>
<node>
<name>CD PLAYERS</name>
<depth>2</depth>
</node>
<node>
<name>2 WAY RADIOS</name>
<depth>2</depth>
</node>
</tree>
which represents the following structure:
root
TELEVISIONS
TUBE
LCD
PLASMA
PORTABLE ELECTRONICS
MP3 PLAYERS
FLASH
CD PLAYERS
2 WAY RADIOS
How can I convert this flat XML list to a nested HTML list using XSLT?
PS: this is the example tree from the Managing Hierarchical Data in MySQL.
That form of flat list is very hard to work with in xslt, as you need to find the position of the next grouping, etc. Can you use different xml? For example, with the flat xml:
<?xml version="1.0" encoding="utf-8" ?>
<tree>
<node key="0">root</node>
<node key="1" parent="0">TELEVISIONS</node>
<node key="2" parent="1">TUBE</node>
<node key="3" parent="1">LCD</node>
<node key="4" parent="1">PLASMA</node>
<node key="5" parent="0">PORTABLE ELECTRONICS</node>
<node key="6" parent="5">MP3 PLAYERS</node>
<node key="7" parent="6">FLASH</node>
<node key="8" parent="5">CD PLAYERS</node>
<node key="9" parent="5">2 WAY RADIOS</node>
</tree>
It becomes trivial to do (very efficiently):
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:key name="nodeChildren" match="/tree/node" use="#parent"/>
<xsl:template match="tree">
<ul>
<xsl:apply-templates select="node[not(#parent)]"/>
</ul>
</xsl:template>
<xsl:template match="node">
<li>
<xsl:value-of select="."/>
<ul>
<xsl:apply-templates select="key('nodeChildren',#key)"/>
</ul>
</li>
</xsl:template>
</xsl:stylesheet>
Is that an option?
Of course, if you build the xml as a hierarchy it is even easier ;-p
In XSLT 2.0 it would be rather easy with the new grouping functions.
In XSLT 1.0 it's a little more complicated but this works:
<xsl:template match="/tree">
<xhtml>
<head/>
<body>
<ul>
<xsl:apply-templates select="node[depth='0']"/>
</ul>
</body>
</xhtml>
</xsl:template>
<xsl:template match="node">
<xsl:variable name="thisNodeId" select="generate-id(.)"/>
<xsl:variable name="depth" select="depth"/>
<xsl:variable name="descendants">
<xsl:apply-templates select="following-sibling::node[depth = $depth + 1][preceding-sibling::node[depth = $depth][1]/generate-id() = $thisNodeId]"/>
</xsl:variable>
<li>
<xsl:value-of select="name"/>
</li>
<xsl:if test="$descendants/*">
<ul>
<xsl:copy-of select="$descendants"/>
</ul>
</xsl:if>
</xsl:template>
The heart of the matter is the long and ugly "descendants" variable, which looks for nodes after the current node that have a "depth" child greater than the current depth, but are not after another node that would have the same depth as the current depth (because if they were, they would be children of that node instead of the current one).
BTW there is an error in your example result: "FLASH" should be a child of "MP3 PLAYERS" and not a sibling.
EDIT
In fact (as mentionned in the comments), in "pure" XSLT 1.0 this does not work for two reasons: the path expression uses generate-id() incorrectly, and one cannot use a "result tree fragment" in a path expression.
Here is a correct XSLT 1.0 version of the "node" template (successfully tested with Saxon 6.5) that does not use EXSLT nor XSLT 1.1:
<xsl:template match="node">
<xsl:variable name="thisNodeId" select="generate-id(.)"/>
<xsl:variable name="depth" select="depth"/>
<xsl:variable name="descendants">
<xsl:apply-templates select="following-sibling::node[depth = $depth + 1][generate-id(preceding-sibling::node[depth = $depth][1]) = $thisNodeId]"/>
</xsl:variable>
<xsl:variable name="descendantsNb">
<xsl:value-of select="count(following-sibling::node[depth = $depth + 1][generate-id(preceding-sibling::node[depth = $depth][1]) = $thisNodeId])"/>
</xsl:variable>
<li>
<xsl:value-of select="name"/>
</li>
<xsl:if test="$descendantsNb > 0">
<ul>
<xsl:copy-of select="$descendants"/>
</ul>
</xsl:if>
</xsl:template>
Of course, one should factor the path expression that is repeated, but without the ability to turn "result tree fragments" into XML that can actually be processed, I don't know if it's possible? (writing a custom function would do the trick of course, but then it's much simpler to use EXSLT)
Bottom line: use XSLT 1.1 or EXSLT if you can!
2nd Edit
In order to avoid to repeat the path expression, you can also forget the test altogether, which will simply result in some empty that you can either leave in the result or post-process to eliminate.
very helpful!
one suggestion is moving the < ul > inside the template would remove the empty ul.
<xsl:template match="tree">
<xsl:apply-templates select="node[not(#parent)]"/>
</xsl:template>
<xsl:template match="node">
<ul>
<li>
<xsl:value-of select="."/>
<xsl:apply-templates select="key('nodeChildren',#key)"/>
</li>
</ul>
</xsl:template>
</xsl:stylesheet>
You haven't actually said what you'd like the html output to look like, but I can tell you that from an XSLT point of view going from a flat structure to a tree is going to be complex and expensive if you're also basing this on the position of items in the tree and their relation to siblings.
It would be far better to supply a <parent> attribute/node than the <depth>.