What is the meaning of '.?select=*.xml;recurse=yes' in xslt string? - xslt

I've seen Michael Kay kindly respond to XSL questions with a template something like shown below.
I'm wondering where the syntax for the string being passed to collection() is documented? I tried searching the XSL spec for some sort of wildcard pattern with recursion but came up empty.
<xsl:template name="main">
<xsl:for-each select="collection('.?select=*.xml;recurse=yes')">
<xsl:result-document href="out/{tokenize(document-uri(.), '/')[last()]">
<xsl:apply-templates select="."/>
</xsl:result-document>
</xsl:for-each>
</xsl:template>

Per Martin's link, it appears this special behavior is native to StandardCollectionURIResolver in Saxon which interprets ?select=*.xml;recurse=yes as a query string where select provides file globbing and recurse provides automatic directory recursion.

Related

ibm-datapower gives "&gt" in place of ">" and "&lt" in place of "<" in resulting XML for text data using XSLT

In one of my application, I am trying to convert the response of my service with the help of xslt on datapower.
In one of the response scenario, I need to show an xml something like below:
<data contentType="text/xml;charset=utf-8" contentLength="80"><![CDATA[Your request cannot be processed]]></data>
But my XSLT fails on datapower and it shows ">" and "<" in place of ">" and "<".
Below are my some of the attempted templates. Kindly have a look and suggest any correction:
Attempt 1:Tried with ">" and "<"
<xsl:param name="mask" select="'Your request cannot be processed'"/>
<xsl:template match="*" mode="copyFault">
<xsl:text disable-output-escaping="yes"><data contentType="text/xml;charset=utf-8" contentLength="80"><![CDATA[</xsl:text>
<xsl:value-of select="$mask" />
<xsl:text disable-output-escaping="yes">]]></data></xsl:text>
</xsl:template>
Attempt 2:Tried with HEX values
<xsl:param name="mask" select="'Your request cannot be processed'"/>
<xsl:variable name="lessThan" select="'<'"/>
<xsl:variable name="GreaterThan" select="'>'"/>
<xsl:template match="*" mode="copyFault">
<xsl:value-of disable-output-escaping = "yes" select="$lessThan"/>
<xsl:text>data contentType="text/xml;charset=utf-8" contentLength="80"</xsl:text>
<xsl:value-of disable-output-escaping = "yes" select="$GreaterThan"/>
<xsl:value-of disable-output-escaping = "yes" select="$lessThan"/>
<xsl:text>![CDATA[</xsl:text>
<xsl:value-of select="$mask" />
<xsl:text>]]</xsl:text>
<xsl:value-of disable-output-escaping = "yes" select="$GreaterThan"/>
<xsl:value-of disable-output-escaping = "yes" select="$lessThan"/>
<xsl:text>/data</xsl:text>
<xsl:value-of disable-output-escaping = "yes" select="$GreaterThan"/>
</xsl:template>
Please let me know what should I do to get the xml in proper format from datapower.
Thanks.
The usual way in XSLT to output a certain XML element is a literal result element so using
<data contentType="text/xml;charset=utf-8" contentLength="80">Your request cannot be processed</data>
in your XSLT will then output that element in the result. If you want to populate the element with a variable or parameter value then use e.g.
<data contentType="text/xml;charset=utf-8" contentLength="80"><xsl:value-of select="$mask"/></data>
If the XSLT processor is in charge of serializing the result to a file or string and you want some element like the data element to have a CDATA section as the content then declare e.g. <xsl:output cdata-section-elements="data"/> as a child of xsl:stylesheet (or xsl:transform if you have named the root element that way).
disable-output-escaping is a thoroughly nasty feature: it doesn't work on all processors, and if it's supported at all, it only works when the transformation output is fed directly into an XSLT-aware serializer, so it depends on how you are running the transformation.
It's much better to avoid disable-output-escaping when you can, and there's certainly no evidence you need it here. The requirement to output a CDATA section is somewhat unusual (any well-written application reading XML doesn't care whether the text is in a CDATA section or not), but if you really need it, then you can usually achieve it using <xsl:output cdata-section-elements="data"/>. (Though again, this only works if the output is fed into an XSLT-aware serializer.)
Certainly, generating start and end tags using disable-output-escaping is very poor practice.

How to match uri-collection results using templates

I have variable with collection of files URIs.
<xsl:variable name="swiftFilesPath" select="concat($inputPath, '?select=*.swift;recurse=yes;on-error=warning')"/>
<xsl:variable name="swiftFiles" select="uri-collection($swiftFilesPath)"/>
I want to use apply-templates to process through all URIs.
For now I'm using for-each for getting files and then process through each line.
<xsl:for-each select="$swiftFiles">
[...]
<xsl:variable name="filePath" select="."/>
<xsl:variable name="fileContent" select="unparsed-text($filePath, $encoding)"/>
<xsl:for-each select="tokenize($fileContent, '\n')">
[...]
</xsl:for-each>
</xsl:for-each>
I am thinking about changing it to something like this:
<xsl:apply-templates select="$swiftFiles" mode="swiftFiles"/>
[...]
<xsl:template match="*" mode="swiftFiles">
[...]
</xsl:template/>
Will it be better approach to processing files? I mean apply-templates better than for-each.
Is there a way to avoid "*" in template match? Maybe something like "*[. castable as xs:anyURI]"?
Firstly, I don't think there's anything to gain from using apply-templates unless there's some kind of dynamic despatch going on. For example, if you had both .txt URIs and .xml URIs then you could do
<xsl:apply-templates select="uri-collection(....)" mode="dereference"/>
<xsl:template match=".[ends-with(., '.txt')]" mode="dereference">
--- process unparsed text file ----
</xsl:template>
<xsl:template match=".[ends-with(., '.xml')]" mode="dereference">
--- process XML file ----
</xsl:template>
<xsl:template match="." mode="dereference"/>
But if they are all processed the same way, then xsl:for-each does the job perfectly well.
I've answered your second question by using "." as the pattern that matches everything (atomic values included). The pattern "*" will only match element nodes.

Some correct XPath expressions don't work as expected

I'm doing some XML transform on a file generated by WiX after harvesting registry data. For those who are unfamiliar with WiX, just consider I'm trying to do XML transfer on a XML file, no matter where it is coming from. The issue I'm experiencing is: when I use XPath like
match="node()[name() = 'File'][not(#KeyPath)]"
then the matching works fine and it finds all those File nodes that are missing the KeyPath attribute in it, however, if I use another expression of XPath like
match="//File[not(#KeyPath)]"
then it doesn't find any match.
In general I cannot use the standard XPaths with /, //, ., .., however, a piece below with an XPath example in it works well
<xsl:template match="node()[name() = 'File']
[contains(#Source, 'First.dll') or
contains(#Source, 'Second.dll')]
[not(#Assembly)]">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
<xsl:attribute name="Assembly">.net</xsl:attribute>
<xsl:attribute name="KeyPath">yes</xsl:attribute>
</xsl:copy>
</xsl:template>
but something like /bookstore/book[#lang='en'] would not work. Perhaps, I'm missing some descriptions at the beginning of my XSL file to enable recognition of Xpaths like this.
This is because the nodes in the XML file are almost certainly in a namespace
<Wix xmlns="http://schemas.microsoft.com/wix/2006/wi">
Your expression //File[not(#KeyPath)] is looking for a File element that is is no namespace. You need to account for the namespace in your XSLT
So, bind a prefix to it on the xsl:stylesheet like so...
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:wix="http://schemas.microsoft.com/wix/2006/wi" />
Then your match expression becomes this..
<xsl:template match="//wix:File[not(#KeyPath)]" />
In fact, the // is not necessary here in a match. This will work too
<xsl:template match="wix:File[not(#KeyPath)]" />

not(#attribute) test not working in JDom XSL transform?

I have a piece of an XSLT stylesheet that works as expected using xsltproc but produces a different output in my actual application, where the transform is applied via org.jdom.transform.XSLTransformer (jdom 1.0), I believe using Xalan.
Stylesheet snippet (this is part of a larger template that starts like this: <xsl:template match="/dspace:dim[#dspaceType='ITEM']">):
<xsl:if test="//dspace:field[#mdschema='dc' and #element='rights']">
<rightsList>
<xsl:if test="//dspace:field[#mdschema='dc' and #element='rights' and not(#qualifier) and #language='*']">
<rights>
<xsl:if test="//dspace:field[#mdschema='dc' and #element='rights' and #qualifier='uri' and #language='*']">
<xsl:attribute name="rightsUri">
<xsl:value-of select="//dspace:field[#mdschema='dc' and #element='rights' and #qualifier='uri' and #language='*']"/>
</xsl:attribute>
</xsl:if>
<xsl:value-of select="//dspace:field[#mdschema='dc' and #element='rights' and not(#qualifier) and #language='*']" />
</rights>
</xsl:if>
<xsl:apply-templates select="//dspace:field[#mdschema='dc' and #element='rights' and not(#language='*')]" />
</rightsList>
</xsl:if>
and
<xsl:template match="//dspace:field[#mdschema='dc' and #element='rights' and not(#language='*')]">
<rights><xsl:value-of select="." /></rights>
</xsl:template>
XML snippet:
<dim:dim dspaceType="ITEM" xmlns:dim="http://www.dspace.org/xmlns/dspace/dim">
<dim:field element="rights" language="en_NZ" mdschema="dc">Actual text redacted</dim:field>
<dim:field element="rights" language="*" mdschema="dc">Attribution 3.0 New Zealand</dim:field>
<dim:field element="rights" qualifier="uri" language="*" mdschema="dc">http://creativecommons.org/licenses/by/3.0/nz/</dim:field>
</dim:dim>
With xsltproc, this produces
<rightsList>
<rights rightsUri="http://creativecommons.org/licenses/by/3.0/nz/">Attribution 3.0 New Zealand</rights>
<rights>Actual text redacted</rights>
</rightsList>
In my application, this produces
<rightsList>
<rights>Actual text redacted</rights>
<rights>Attribution 3.0 New Zealand</rights>
<rights>http://creativecommons.org/licenses/by/3.0/nz/</rights>
</rightsList>
So to me it looks like the not(#qualifier) bit doesn't work using jdom.
I'd appreciate any insight into what's going on here and how I might change the stylesheet to get the same result in my application that I currently get via xsltproc.
Edited to add: just in case it makes any difference, the stylesheet starts out as
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:dspace="http://www.dspace.org/xmlns/dspace/dim"
xmlns:exslt="http://exslt.org/common"
xmlns="http://datacite.org/schema/kernel-3"
extension-element-prefixes="exslt"
exclude-result-prefixes="exslt"
version="1.0">
and also includes this template:
<!-- Don't copy everything by default! -->
<xsl:template match="#* | text()" />
See my answer below the XML structure is actually different from what I thought it was, so the problem wasn't in the XSL after all.
Apart from solving your original problem, let's have a quick look at how to reorganize your code.
You use a lot of //foo expressions. Starting an expression with //foo means "search the whole document, at any level, for the element with the name foo". Apart from this being a potentially expensive operation, this often has unwanted side effects and makes your code hard to read, because it requires you to specify each element uniquely, leading to a lot of duplicated code.
You also use a lot of xsl:if, but in XSLT, it is hardly ever necessary to use if-statements (an exception in XSLT 1.0 and 2.0 being when you deal with something other than nodes). In almost all cases, you can replace an xsl:if with a simple xsl:apply-templates.
That said, let's have a look how we can rewrite your code to get the same effect and have less chance for error:
<xsl:if test="//dspace:field[#mdschema='dc' and #element='rights']">
<rightsList>
.....
Is similar to having a matching template as follows (assuming you have a throw-away template for uninteresting nodes):
<xsl:template match="dspace:dim[dspace:field[#mdschema='dc' and #element='rights']]">
<rightsList>
This says: if you encounter a dim element with any field element that has those properties set, then output <rightsList>.
Then you have:
<xsl:if test="//dspace:field[#mdschema='dc' and #element='rights' and not(#qualifier) and #language='*']">
<rights>
Which is precisely equivalent to the following apply-template expression (assuming a matching template with it):
<xsl:apply-templates select="dspace:field[#mdschema='dc' and #element='rights' and not(#qualifier) and #language='*']" />
Here we find that a little bit below that, we have an almost equivalent expression, this time with not(#language='*'). So let's see if we can get rid of those duplicate expressions altogether.
First, let's go back a bit and have a look at what you were doing:
If anywhere any "dc" and "rights", then create a <rightsList>
If anywhere any of these have do not have a qualifier but have a language "*", create <rights>
Inside this, create an attribute rightsUri if anywhere any qualifier has value "uri" and language "*", set its value to the first such you find
After this <rights> element (there can be at most one of them in your current structure), create a list of <rights> for each field element with language "*"
If this is correct, then this can be rewritten as follows:
<xsl:template match="dspace:dim[dspace:field[#mdschema='dc' and #element='rights']]">
<xsl:variable name="adjusted">
<xsl:copy-of select="dspace:field[#mdschema='dc' and #element='rights']"/>
</xsl:variable>
<rightsList>
<xsl:apply-templates select="exsl:node-set($adjusted)/*[not(#qualifier) and #language='*'][1]" mode="noquali"/>
<xsl:apply-templates select="exsl:node-set($adjusted)/*[not(#language='*')]" />
</rightsList>
</xsl:template>
<xsl:template match="dspace:field" mode="noquali">
<rights>
<xsl:apply-templates select="/dspace:field[#qualifier='uri' and #language='*'][1]" mode="uri"/>
<xsl:value-of select="."/>
</rights>
</xsl:template>
<xsl:template match="dspace:field" mode="uri">
<xsl:attribute name="rightsUri" select="." />
</xsl:template>
<!-- matching anything else -->
<xsl:template match="dspace:field">
<rights><xsl:value-of select="." /></rights>
</xsl:template>
The exsl:node-set function is supported by just about every XSLT 1.0 processor, just add the namespace xmlns:exsl="http://exslt.org/common" to your xsl:stylesheet declaration.
Note that I added a few times [1] to the select-expressions. While you don't do that in your code, your current code has the same effect, but if you use apply-templates, if you encounter multiple matches, you have to specify that you are only interested in the first match.
I think your code can be further simplified, but I wanted to make sure that the logic remains exactly the same. As you can see, the end result is without any //. However, you do see one /, which is now pointing to the root of the node-set, which conveniently only has the nodes you are interested in: the ones with schema "dc" and "rights" element attributes, so we do not have to repeat that expression over and over again.
You may try this rewrite and see if it helps with your current bug, otherwise I'll gladly to help you further.
Edit
After your edit, your original context item will have been dspace:dim already. If you don't mind always outputting <rightsList> (even if it ends up empty), you can simply replace my first template match pattern above with your existing dspace:dim pattern.
Duh. Forest/trees indeed. Even though the language attribute is called "language" pretty much everywhere else in the application (see also, the XML snippet I gave), it is actually called "lang" in the XML that my stylesheet operates on - I finally gave in and used this answer to be sure what the XML structure is. Surprise!
Anyway, I followed the advice Abel gave in his answer in part and simplified the templates for this particular case quite a bit. I now just have
<xsl:if test="dspace:field[#mdschema='dc' and #element='rights']">
<rightsList>
<xsl:apply-templates select="dspace:field[#mdschema='dc' and #element='rights']"/>
</rightsList>
</xsl:if>
in the big template, plus a couple of custom ones:
<xsl:template match="dspace:field[#mdschema='dc' and #element='rights']">
<xsl:choose>
<xsl:when test="#qualifier='uri'"/>
<xsl:otherwise>
<rights>
<xsl:if test="#lang='*'">
<xsl:apply-templates select="//dspace:field[#mdschema='dc' and #element='rights' and #qualifier='uri' and #lang='*'][1]" mode="rightsURI"/>
</xsl:if>
<xsl:value-of select="."/>
</rights>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="dspace:field[#mdschema='dc' and #element='rights' and #qualifier='uri' and #lang='*']" mode="rightsURI">
<xsl:attribute name="rightsURI"><xsl:value-of select="."/></xsl:attribute>
</xsl:template>

XSLT Selection of Nodes Based on Substring of Element Name

How can I, with XSLT, select nodes based on a substring of the nodes' element name?
For example, consider the XML:
<foo_bar>Keep this.
<foo_who>Keep this, too.
<fu_bar>Don't want this.</fu_bar>
</foo_who>
</foo_bar>
From which I want to output:
<foo_bar>Keep this.
<foo_who>Keep this, too.
</foo_who>
</foo_bar>
Here I want to select for processing those nodes whose names match a regex like "foo.*".
I think I need an XSLT template match attribute expression, or an apply-templates select attribute expression, that applies the regex to the element's name. But maybe this can't be done without some construct like an statement?
Any help would be appreciated.
Here is some XSL that finds elements that start with "foo" to get you started. I don't think regex functionality was added until XSLT 2.0 based on Regular Expression Matching in XSLT 2.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="*">
<xsl:variable name="name" select="local-name()"/>
<xsl:if test="starts-with($name, 'foo')">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
It gives this output, which seems to have an extra newline.
<foo_bar>Keep this.
<foo_who>Keep this, too.
</foo_who>
</foo_bar>