XSLT 2.0: declare string array - xslt

is it possible declare a string array in XSLT 2.0?
I would like declare something as below:
<xsl:variable name="countries" select="('IT','EN','SP')" />
but doesn't work.
Thanks.

There are no arrays in the data model that XSLT/XPath 2.0 work with. Your code <xsl:variable name="countries" select="('IT','EN','SP')" /> binds a sequence of string values to the variable named countries and you should be able to access single items with a positional predicate like e.g. $countries[2].
If you think you need arrays then you need to use XPath 3.1 where you can use <xsl:variable name="countries" select="['IT', 'EN', 'SP']"/>, see http://www.w3.org/TR/xpath-31/#id-arrays. I am not aware of any released XSLT processor that supports this, however, I think XQuery implementations like BaseX have already been upgraded to support this.
As long as you simply want to store three or more atomic values in a flat data type I don't see why you would need the new array feature, the sequence ('IT','EN','SP') should suffice. Arrays can be nested, sequences not, as (1, (2, 3), 4) results in a flat sequence (1, 2, 3, 4), so in case you needed a nested data structure (with primitive values like numbers or strings) you might need an array and have to wait until there is support for XPath 3.1, you can however always created an XML tree structure for nested data in current versions of XSLT.

Related

XSLT -- detect if node has already been copied to the result tree

Using xsltproc to clean up input XML.
Think about a part number referencing a part description from random locations in the document. My XML input is poorly designed and it has part number references to part descriptions all over with no real pattern as to where they are located. Some references are text in elements, some in attributes, sometimes the attribute changes meaning depending on context. The attribute containing the part number does not have a consistent name, the name used alters depending on the value of other attributes. Maybe I could build a key selecting the dozen varying places containing part number but it would be a mess. I would also worry about inadvertently selecting the wrong items with complex patterns.
So my goal is to only copy the referenced part descriptions to the output document once (not all descriptions are referenced). I can insert tests in all of the various templates to detect the part number in context. The very simple solution would be to just test if it has already been copied to the result tree and not copy it again. But there is no way to track this?
Plan B is to copy it multiple times to the result tree and then do a second pass over the output document to remove the duplicates.
The use of temporal language in the question ("has already been") is a good clue that you're thinking about this the wrong way. In a declarative language, you shouldn't be thinking in terms of the order of processing.
What you're probably looking for is something like this:
<xsl:variable name="first-of-a-kind-part-references" as="node()*">
<xsl:for-each-group select="f:all-part-references(/)"
group-by="f:get-referenced-part(.)/#id">
<xsl:sequence select="current-group()[1]"/>
</xsl:for-each-group>
</xsl:variable>
and then when processing a part reference
<xsl:if test=". intersect $first-of-a-kind-part-references">
...
</xsl:if>

Revisited: Sort elements of arbitrary XML document recursively

This chapter in my XSLT saga is an extension of the question here. Thanks to all of you who have helped me get this far (#Martin Honnen, #Ian Roberts, #Tim C, and anyone else I missed)!
Here is my current problem:
I reorder some siblings in A_v1.xml to create A_v2.xml. I now consider these two files to be different "versions" of the same file. The files two files have the exact same content, only some siblings are in a different order. Another way of saying it, each element in A_v2.xml still has the same parent as it did in A_v1.xml, but it may now occur before siblings it used to occur after, or may occur after siblings it used to occur before.
I transform A_v1.xml into A_v1_transformed.xml
I transform A_v2.xml into A_v2_transformed.xml
I compare A_v1_transformed.xml to A_v2_transformed.xml and to my dismay they are not identical. Further more neither of them are in the expected order shown in expected.xml. They have the same content, but the elements are not sorted in the same order.
My first sort is <xsl:sort select="local-name()"/>. #G. Ken Holman turned me onto <xsl:sort select="."/> (which has the same effect as <xsl:sort select="self::*"/> which I was using). When I use those two sorts in combination I get almost exactly what I want, but in some places it seems the expected alphabetical order is just randomly broken.
I have beefed up my sample files. To keep the question short I just put them on pastebin.
A_v1.xml
A_v2.xml
A_v1_transformed.xml
A_v2_transformed.xml
Here is one of the transformed files with comments added by me to help you understand where/why I think the transform sorted these files incorrectly. I didn't comment the other transformed file because it has similar "failures".
A_v1_transformed_with_comments.xml
Both of the transformed documents should have the same checksum as expected.xml, but they don't. That is my biggest concern. Alphabetical sorting seems the most sane way to sort, but so long as the transform sorted in some sane way I couldn't care less how the sort happened so long as the sort is repeatable among different "versions" of the same file.
expected.xml
The following XLS files both yield the same result, but the "multi-pass" version may be easier to understand.
xsl_concise.xsl
xsl_multi_pass.xsl
Points for discussion:
I have noticed that when sorting alphabetically CAPITALIZED letters take precedence. Even if the capitalized letter comes after a lower case letter alphabetically it will come first in the sort.
Partial success...
I think I may have stumbled onto a partial solution myself, but I am unclear why it works. If you look at my xsl_multi_pass.xsl file you will see:
<!-- Third pass with sortElements mode templates -->
<xsl:variable name="sortElementsRslt">
<xsl:apply-templates mode="sortElements" select="$sortAttributesRslt"/>
</xsl:variable>
<!-- Fourth pass with deDup mode templates -->
<xsl:apply-templates mode="deDup" select="$sortElementsRslt"/>
If I turn that into:
<!-- Third pass with sortElements mode templates -->
<xsl:variable name="sortElementsRslt1">
<xsl:apply-templates mode="sortElements" select="$sortAttributesRslt"/>
</xsl:variable>
<!-- Fourth pass with sortElements mode templates -->
<xsl:variable name="sortElementsRslt2">
<xsl:apply-templates mode="sortElements" select="$sortElementsRslt1"/>
</xsl:variable>
<!-- Fifth pass with deDup mode templates -->
<xsl:apply-templates mode="deDup" select="$sortElementsRslt2"/>
This sorts the elements twice, I don't know why it is necessary. The result using the example files I have provided is what I expected minus the CAPITALIZED letters taking precedence, but that doesn't bother me so long as the result is consistent which it appears to be. The problem is that this "solution" causes another part of the real files I'm working with to be sorted inconsistently.
SUCCESS!
I think I finally got this working 100% how I want. I incorporated the function given in the answer here by #Dimitre Novatchev to elements by their attribute names and values. I still have to perform two passes to sort the elements (applying the exact same templates twice) as I described above for some reason, but it only takes an extra 3 seconds on a 20MB file, so I'm not too worried about it.
Here is the final result:
xsl_2.0_full_document_sorter.xsl
In a nutshell my ultimate goal with all of my XSLT questions is a stylesheet that when applied to a file will always generate the same result even if run on different "versions" of a that file. A different "version" of a file would be one that had the exact same content, just in a different order. That means an element's attributes may have been moved around and that elements may have occur eariler/later than they previously did.
Have you considered a different tool rather than XSLT for this purpose? The goal you've described sounds to me pretty much exactly the definition of similar() in XMLUnit
// control and test are the two XML documents you want to compare, they can
// be String, Reader, org.w3c.dom.Document or org.xml.sax.InputSource
Diff d = new Diff(control, test);
assert d.similar();
SUCCESS!
I think I finally got this working 100% how I want. I incorporated the function given in the answer here by #Dimitre Novatchev to sort elements by their attribute names and values. I still have to perform two passes to sort the elements (applying the exact same templates twice) as I described above for some reason, but it only takes an extra 3 seconds on a 20MB file, so I'm not too worried about it.
Here is the final result:
xsl_2.0_full_document_sorter.xsl
This transform is 100% generic and should be able to be used on any XML document to sort it in what I would consider the most sane way possible. The major benefit of this stylesheet is that it will transform multiple files that have the same content in different orders the exact same way, to the transformed results of all the files that have the same content will be identical.

XSLT 1.0 Preliminary processing strategy

(Let me risk a general strategy question that might offend protocol and embarrass me.)
Before the element-by-element transformation of my input XML into HTML, I need to determine the order in which elements will be presented. Doing so requires recursive analyses, including merging ordered lists and tracing a graph, all based on attributes of the elements to be presented.
After that processing I have, in effect, an ordered list of attribute values, and I will present the elements sorted by that list.
Which strategy is better?
1) Put the results of the pre-analysis into a global variable, which would be a list of attribute values, and then iterate through that list, something like this:
<xsl:variable name=orderOfPresentation>
<xsl:call-template name="analyses">
</xsl:variable>
<xsl:template match="root">
<xsl:for-each select="$orderOfPresentation">
<xsl:apply-templates select="/" />
</xsl:for-each>
</xsl:template>
or
2) Apply formatting templates deep in the analyses, once the ordered list has been determined, without closing out the recursions, something like this:
<xsl:template match="root">
<xsl:call-template name="analysis">
[with, as parameters, various sets of attribute nodes, extracted from the input XML]
</xsl:call-template>
</xsl:template>
<xsl:template name="analysis">
[recursions that include calls to sub-analysis]
</xsl:template>
<xsl:template name="sub-analysis">
[recursions that include calls to sub-sub-analysis]]
</xsl:template>
<xsl:template name="sub-sub-analysis">
[more work, which eventually produces an ordered list, $orderOfPresentation]
<xsl:for-each select="$orderOfPresentation">
<xsl:apply-templates select="/" />
</xsl:for-each>
</xsl:template>
The first strategy may just expose the thinking of a procedural-language programmer but does seem to have the benefit of letting the processor clean up after itself before getting on with the business of the real transformation.
But XLST 1.0 (in which I must work, without extensions) doesn't have a simple way to represent a list of string values, and (if I understand this correctly) can only pass back to the global variable a Result Tree Fragment (something I fail to understand). So strategy (1) seems bad.
But (2), a strategy that calls all the real transformation from deep inside a recursion, seems inefficient and difficult to maintain.
Is one of these strategies the best practice? Or are they both evidence of a confused mind?
Like Doc Brown, I would use a third strategy: generate the target stylesheet, then execute it. It's quite possible to do this in the browser with a bit of Javascript.
The needed result of the pre-processing is a list of strings (an ordered list of attribute values), so:
Ignore the fact that an RTF is a tree fragment and just treat it like a string. In the called template, don't create XML nodes, just output strings using value-of plus a delimiter.
The global variable becomes effectively just a list of strings. (The functions substring-before( , delimiter) and substring-after( , delimiter) can form a nice little pop function for the list.)
So get the benefits of finishing the preprocessing first (strategy 1) instead of launching the transformation from deep inside the preprocessing (strategy 2), but do so by thinking of the global variable not as an RTF but as an ordered list of strings.

XSLT 2.0 How to change default formatting for numbers?

In XSLT 1.0 (using Xalan), outputting the result of:
<xsl:variable name="source0" select="number(num3)"/>
<xsl:value-of select="$source0"/>
was the number spelled out as 2011234. But in XSLT 2.0 (using Saxon), it shows up as 2.011234E6. I want it to always display as 2011234 in the Saxon/2.0 case.
Is there a way to set the default picture string for whenever it outputs a number?
I saw decimal-format, but that just affects picture strings, it doesn't set number formatting. I can't just throw format-number everywhere since then I'd have to check datatypes everywhere and... it would be a mess.
There is no way to express in XSLT 2.0 (or XSLT 1.0) that every time a number value is output it must be in a "default" format, without ussing fn:format-number() or xsl:decimal-format or op:cast or built-in type constructors. The only way that every number will be consider of some specific type is that a schema has been declared for the input (so it's a PSVI) and you run the transformation with schema-awere processor.

XSLT 1.0: sorting between multiple documents

<xsl:for-each select="//filenames">
<xsl:variable name="current_filename" select="."/>
<xsl:for-each select="
document(.)//someNode[not(
. = document($current_filename/preceding-sibling::node())//someNode
)]
">
<xsl:value-of select="."/>
</xsl:for-each>
</xsl:for-each>
In the above code (XSLT 1.0), I have a series of documents (//filenames), which I want to open and select some nodes from, unless that node's value equals the value of a same node in all preceding documents.
To get this to work I had to nest two for-each loops, because I have to save the current documents name in a variable in order to select its preceding sibling ($current_filename/preceding-sibling).
This all works, but since I have two nested loops, I'm unable to sort the resulting nodes from all documents as if it were one big sequence. It now sorts the nodes per document if I insert a sorting rule into the first for-each.
Does anyone know a way to achieve this sorting anyway? Maybe a way to avoid having to use the variable and thus the nesting of for-each loops?
The only way to do that in one step is to store all the nodes in a variable and convert it to a node set with the node-set() extension function. The combined node-set can then be sorted normally.
If you can't use the node-set() function for some reason, you can only break up the operation in two separate transformation steps: 1) output nodes unsorted in a temp document, 2) transform temp document into desired output.
you could put the whole result inside a variable - then using node-set you can then resort the results. see here for examples of using node-set http://www.exslt.org/exsl/functions/node-set/index.html
Josh
I've found out how to do this!
By first just selecting all nodes, and sorting them, I was able to then filter out the nodes I didn't want with ! So I changed the order of selecting/sorting. First selecting followed by sorting was impossible, but the other way around works fine! Thanks for your input though :).