XSLT 1.0: sorting between multiple documents - xslt

<xsl:for-each select="//filenames">
<xsl:variable name="current_filename" select="."/>
<xsl:for-each select="
document(.)//someNode[not(
. = document($current_filename/preceding-sibling::node())//someNode
)]
">
<xsl:value-of select="."/>
</xsl:for-each>
</xsl:for-each>
In the above code (XSLT 1.0), I have a series of documents (//filenames), which I want to open and select some nodes from, unless that node's value equals the value of a same node in all preceding documents.
To get this to work I had to nest two for-each loops, because I have to save the current documents name in a variable in order to select its preceding sibling ($current_filename/preceding-sibling).
This all works, but since I have two nested loops, I'm unable to sort the resulting nodes from all documents as if it were one big sequence. It now sorts the nodes per document if I insert a sorting rule into the first for-each.
Does anyone know a way to achieve this sorting anyway? Maybe a way to avoid having to use the variable and thus the nesting of for-each loops?

The only way to do that in one step is to store all the nodes in a variable and convert it to a node set with the node-set() extension function. The combined node-set can then be sorted normally.
If you can't use the node-set() function for some reason, you can only break up the operation in two separate transformation steps: 1) output nodes unsorted in a temp document, 2) transform temp document into desired output.

you could put the whole result inside a variable - then using node-set you can then resort the results. see here for examples of using node-set http://www.exslt.org/exsl/functions/node-set/index.html
Josh

I've found out how to do this!
By first just selecting all nodes, and sorting them, I was able to then filter out the nodes I didn't want with ! So I changed the order of selecting/sorting. First selecting followed by sorting was impossible, but the other way around works fine! Thanks for your input though :).

Related

XSLT -- detect if node has already been copied to the result tree

Using xsltproc to clean up input XML.
Think about a part number referencing a part description from random locations in the document. My XML input is poorly designed and it has part number references to part descriptions all over with no real pattern as to where they are located. Some references are text in elements, some in attributes, sometimes the attribute changes meaning depending on context. The attribute containing the part number does not have a consistent name, the name used alters depending on the value of other attributes. Maybe I could build a key selecting the dozen varying places containing part number but it would be a mess. I would also worry about inadvertently selecting the wrong items with complex patterns.
So my goal is to only copy the referenced part descriptions to the output document once (not all descriptions are referenced). I can insert tests in all of the various templates to detect the part number in context. The very simple solution would be to just test if it has already been copied to the result tree and not copy it again. But there is no way to track this?
Plan B is to copy it multiple times to the result tree and then do a second pass over the output document to remove the duplicates.
The use of temporal language in the question ("has already been") is a good clue that you're thinking about this the wrong way. In a declarative language, you shouldn't be thinking in terms of the order of processing.
What you're probably looking for is something like this:
<xsl:variable name="first-of-a-kind-part-references" as="node()*">
<xsl:for-each-group select="f:all-part-references(/)"
group-by="f:get-referenced-part(.)/#id">
<xsl:sequence select="current-group()[1]"/>
</xsl:for-each-group>
</xsl:variable>
and then when processing a part reference
<xsl:if test=". intersect $first-of-a-kind-part-references">
...
</xsl:if>

Revisited: Sort elements of arbitrary XML document recursively

This chapter in my XSLT saga is an extension of the question here. Thanks to all of you who have helped me get this far (#Martin Honnen, #Ian Roberts, #Tim C, and anyone else I missed)!
Here is my current problem:
I reorder some siblings in A_v1.xml to create A_v2.xml. I now consider these two files to be different "versions" of the same file. The files two files have the exact same content, only some siblings are in a different order. Another way of saying it, each element in A_v2.xml still has the same parent as it did in A_v1.xml, but it may now occur before siblings it used to occur after, or may occur after siblings it used to occur before.
I transform A_v1.xml into A_v1_transformed.xml
I transform A_v2.xml into A_v2_transformed.xml
I compare A_v1_transformed.xml to A_v2_transformed.xml and to my dismay they are not identical. Further more neither of them are in the expected order shown in expected.xml. They have the same content, but the elements are not sorted in the same order.
My first sort is <xsl:sort select="local-name()"/>. #G. Ken Holman turned me onto <xsl:sort select="."/> (which has the same effect as <xsl:sort select="self::*"/> which I was using). When I use those two sorts in combination I get almost exactly what I want, but in some places it seems the expected alphabetical order is just randomly broken.
I have beefed up my sample files. To keep the question short I just put them on pastebin.
A_v1.xml
A_v2.xml
A_v1_transformed.xml
A_v2_transformed.xml
Here is one of the transformed files with comments added by me to help you understand where/why I think the transform sorted these files incorrectly. I didn't comment the other transformed file because it has similar "failures".
A_v1_transformed_with_comments.xml
Both of the transformed documents should have the same checksum as expected.xml, but they don't. That is my biggest concern. Alphabetical sorting seems the most sane way to sort, but so long as the transform sorted in some sane way I couldn't care less how the sort happened so long as the sort is repeatable among different "versions" of the same file.
expected.xml
The following XLS files both yield the same result, but the "multi-pass" version may be easier to understand.
xsl_concise.xsl
xsl_multi_pass.xsl
Points for discussion:
I have noticed that when sorting alphabetically CAPITALIZED letters take precedence. Even if the capitalized letter comes after a lower case letter alphabetically it will come first in the sort.
Partial success...
I think I may have stumbled onto a partial solution myself, but I am unclear why it works. If you look at my xsl_multi_pass.xsl file you will see:
<!-- Third pass with sortElements mode templates -->
<xsl:variable name="sortElementsRslt">
<xsl:apply-templates mode="sortElements" select="$sortAttributesRslt"/>
</xsl:variable>
<!-- Fourth pass with deDup mode templates -->
<xsl:apply-templates mode="deDup" select="$sortElementsRslt"/>
If I turn that into:
<!-- Third pass with sortElements mode templates -->
<xsl:variable name="sortElementsRslt1">
<xsl:apply-templates mode="sortElements" select="$sortAttributesRslt"/>
</xsl:variable>
<!-- Fourth pass with sortElements mode templates -->
<xsl:variable name="sortElementsRslt2">
<xsl:apply-templates mode="sortElements" select="$sortElementsRslt1"/>
</xsl:variable>
<!-- Fifth pass with deDup mode templates -->
<xsl:apply-templates mode="deDup" select="$sortElementsRslt2"/>
This sorts the elements twice, I don't know why it is necessary. The result using the example files I have provided is what I expected minus the CAPITALIZED letters taking precedence, but that doesn't bother me so long as the result is consistent which it appears to be. The problem is that this "solution" causes another part of the real files I'm working with to be sorted inconsistently.
SUCCESS!
I think I finally got this working 100% how I want. I incorporated the function given in the answer here by #Dimitre Novatchev to elements by their attribute names and values. I still have to perform two passes to sort the elements (applying the exact same templates twice) as I described above for some reason, but it only takes an extra 3 seconds on a 20MB file, so I'm not too worried about it.
Here is the final result:
xsl_2.0_full_document_sorter.xsl
In a nutshell my ultimate goal with all of my XSLT questions is a stylesheet that when applied to a file will always generate the same result even if run on different "versions" of a that file. A different "version" of a file would be one that had the exact same content, just in a different order. That means an element's attributes may have been moved around and that elements may have occur eariler/later than they previously did.
Have you considered a different tool rather than XSLT for this purpose? The goal you've described sounds to me pretty much exactly the definition of similar() in XMLUnit
// control and test are the two XML documents you want to compare, they can
// be String, Reader, org.w3c.dom.Document or org.xml.sax.InputSource
Diff d = new Diff(control, test);
assert d.similar();
SUCCESS!
I think I finally got this working 100% how I want. I incorporated the function given in the answer here by #Dimitre Novatchev to sort elements by their attribute names and values. I still have to perform two passes to sort the elements (applying the exact same templates twice) as I described above for some reason, but it only takes an extra 3 seconds on a 20MB file, so I'm not too worried about it.
Here is the final result:
xsl_2.0_full_document_sorter.xsl
This transform is 100% generic and should be able to be used on any XML document to sort it in what I would consider the most sane way possible. The major benefit of this stylesheet is that it will transform multiple files that have the same content in different orders the exact same way, to the transformed results of all the files that have the same content will be identical.

XSLT 2.0: Limit the ancestor axes to a certain element/s level up the document tree

I'm seeing a quite odd behaviour, when trying to limit the results given by applying ancestor::* to an element I always get an extra ancestor although is expressly excluded by the predicate.
Here the code:
XML:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<level_a>
<level_b>
<level_c>
<level_d>
<level_e/>
</level_d>
</level_c>
</level_b>
</level_a>
<level_b>
<level_c>
<level_d>
<level_d>
<level_e/>
</level_d>
</level_d>
</level_c>
</level_b>
</root>
XPath:
(//level_d[not(level_d)])[last()]/ancestor::*[level_c|level_b]
so basically I'm selecting the level_d elements that doesn't have another level_d element nested, getting the last one of them and trying to get all the ancestors up to element level_b.
But the result I'm seeing using Altova XMLSpy 2011 is:
level_a
level_b
I don't quite understand why I'm getting that result and how can I improve my xpath to limit effectively the ancestors up to level_b (i.e. level_c and level_b).
Any hint is greatly appreciated!
Regards
Vlax
Well ancestor::*[level_c|level_b] selects all elements on the ancestor axis that have a level_c or level_b child.
You might want (//level_d[not(level_d)])[last()]/ancestor::*[self::level_c|self::level_b].
Or with your textual description "to limit effectively the ancestors to level_b" you simply want (//level_d[not(level_d)])[last()]/ancestor::level_b.
I think you get right result because clause ancestor::*[level_c|level_b] I read as "all ancestors containing element level_b or level_c". So, level_b is ok because it contains level_c and level_a is ok too because it contains level_b.
So if I change your XPath into (//level_d[not(level_d)])[last()]/ancestor::*[level_c] it results into level_b only.
Probably it is not exactly what you asking for but I'm not sure if I understand well the purpose of your XPath :-)

XSLT 1.0 Preliminary processing strategy

(Let me risk a general strategy question that might offend protocol and embarrass me.)
Before the element-by-element transformation of my input XML into HTML, I need to determine the order in which elements will be presented. Doing so requires recursive analyses, including merging ordered lists and tracing a graph, all based on attributes of the elements to be presented.
After that processing I have, in effect, an ordered list of attribute values, and I will present the elements sorted by that list.
Which strategy is better?
1) Put the results of the pre-analysis into a global variable, which would be a list of attribute values, and then iterate through that list, something like this:
<xsl:variable name=orderOfPresentation>
<xsl:call-template name="analyses">
</xsl:variable>
<xsl:template match="root">
<xsl:for-each select="$orderOfPresentation">
<xsl:apply-templates select="/" />
</xsl:for-each>
</xsl:template>
or
2) Apply formatting templates deep in the analyses, once the ordered list has been determined, without closing out the recursions, something like this:
<xsl:template match="root">
<xsl:call-template name="analysis">
[with, as parameters, various sets of attribute nodes, extracted from the input XML]
</xsl:call-template>
</xsl:template>
<xsl:template name="analysis">
[recursions that include calls to sub-analysis]
</xsl:template>
<xsl:template name="sub-analysis">
[recursions that include calls to sub-sub-analysis]]
</xsl:template>
<xsl:template name="sub-sub-analysis">
[more work, which eventually produces an ordered list, $orderOfPresentation]
<xsl:for-each select="$orderOfPresentation">
<xsl:apply-templates select="/" />
</xsl:for-each>
</xsl:template>
The first strategy may just expose the thinking of a procedural-language programmer but does seem to have the benefit of letting the processor clean up after itself before getting on with the business of the real transformation.
But XLST 1.0 (in which I must work, without extensions) doesn't have a simple way to represent a list of string values, and (if I understand this correctly) can only pass back to the global variable a Result Tree Fragment (something I fail to understand). So strategy (1) seems bad.
But (2), a strategy that calls all the real transformation from deep inside a recursion, seems inefficient and difficult to maintain.
Is one of these strategies the best practice? Or are they both evidence of a confused mind?
Like Doc Brown, I would use a third strategy: generate the target stylesheet, then execute it. It's quite possible to do this in the browser with a bit of Javascript.
The needed result of the pre-processing is a list of strings (an ordered list of attribute values), so:
Ignore the fact that an RTF is a tree fragment and just treat it like a string. In the called template, don't create XML nodes, just output strings using value-of plus a delimiter.
The global variable becomes effectively just a list of strings. (The functions substring-before( , delimiter) and substring-after( , delimiter) can form a nice little pop function for the list.)
So get the benefits of finishing the preprocessing first (strategy 1) instead of launching the transformation from deep inside the preprocessing (strategy 2), but do so by thinking of the global variable not as an RTF but as an ordered list of strings.

Copying an element (xsl:function) from the stylesheet to the "result" XML

First, I should say that I am a beginner in terms of XSLT.
Although the exact context may not be so relevant (and might be too confusing), I will provide it below.
I have a chained transformation, which looks like this:
Input.xml is the input file for this transformation, which is performed using transform.xsl. The result of this transformation is output.xml. transform.xml contains a classic custom xsl:function:
xsl:function name="my:f"
xsl:sequence select=".. xpath .."
xsl:function
The result from step 1 (output.xml) is, for step 2, a new transformer (transform2.xsl), which will be using some other XML input (let's say input2.xml).
What I would like to do is to copy the xsl:function node entirely (present in the transform.xsl in step 1) to the output.xml, so that it can be used in step 2.
No updates / changes are needed in this case for the xsl:function while copying it (just a simple node copy).
Note that I do not want to copy the xsl:function only when a given input element (from input.xml) is present. But rather, I want to copy it always, no matter what the input.xml is.
Now I know this can be made by having a separate file which contains my xsl:function, and then using xsl:import to include this file from both transformations (transform.xml and transform2.xml).
But I would like to know if there are other ways of accomplishing this (..without having a separate file where the function is declared / defined)?
Thanks in advance,
M.
You can access the stylesheet document using document('') so doing e.g.
<xsl:template match="/*">
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:copy-of select="document('')/xsl:stylesheet/xsl:function"/>
<xsl:apply-templates/>
</xsl:stylesheet>
</xsl:template>
should copy any xsl:function elements in the stylesheet to the result tree.
[edit]
After the edit it seems you want to copy a function of a certain name: if you want to copy the function of certain name then you could do e.g.
<xsl:copy-of select="document('')/xsl:stylesheet/xsl:function[
resolve-QName(#name, .) eq QName('http://example.com/ns', 'f')]"/>
where f is the local name of the function and http://example.com/ns is the namespace the function is defined in.
You can use the document() built-in function which wil return the stylesheet document for an emtpy URI. Then you can just copy the element to the output.