Choose XSL transform acoording to document content - xslt

I have a large number of XHTML documents which are created by different publishers, determined by a meta tag:
<meta name="citation_publisher" content="ACME publisher"/>
or in a different document
<meta name="citation_publisher" content="BETA publisher"/>
etc.
I have written stylesheets (about 1 page each) such as,
acme.xsl
beta.xsl
etc.
However I do not know the name of the publisher until I read the XHTML file.
It is possible, though very messy, to write a gigantic stylesheet of the form:
<xsl:choose>
<xsl:when test="$publisher='ACME publisher'">
<!-- acme.xsl sheet-->
</xsl:when>
<xsl:when test="$publisher='BETA publisher'">
<!-- beta.xsl sheet-->
</xsl:when>
</xsl:choose>
but there are at least 100 XSL files.
Is there any way, in XSL1, to select the stylesheet chunk according to the publisher? It would be nice to have the stylesheets as separate files and <xsl:import> them rather than have a single huge file.
UPDATE:
I think #Dimitre has answered the question correctly (and so I have accepted). I suspect that #MichaelKay's is actually better , but it does depend on having a pipeline managing the transformer. I shall try the <xsl:include> as a prototype and see whether it has downsides.

I wouldn't attempt to do this within a single XSLT stylesheet. It sounds to me like a good candidate for XProc, or some similar pipeline technology (e.g. Orbeon). Step 1, use XPath to classify the document, Step 2, transform it using the stylesheet chosen according to the results of Step 1.

but there are at least 100 XSL files. Is there any way, in XSL1, to
select the stylesheet chunk according to the publisher? It would be
nice to have the stylesheets as separate files and <xsl:import> them
rather than have a single huge file.
Here is one way to do this (I am showing working just with two content publisher types and this can be done for as many as needed):
Primary stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:import href="unknown.xsl"/>
<xsl:import href="acme.xsl"/>
<xsl:import href="beta.xsl"/>
</xsl:stylesheet>
acme.xsl:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/*[meta[#content='ACME publisher']]">
<xsl:value-of select="x * y"/>
</xsl:template>
</xsl:stylesheet>
beta.xsl:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/*[meta[#content='BETA publisher']]">
<xsl:value-of select="x + y"/>
</xsl:template>
</xsl:stylesheet>
unknown.xsl:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/*">
<xsl:message terminate="yes">Error: Unknown content source</xsl:message>
</xsl:template>
</xsl:stylesheet>
When the transformation specified in the primary stylesheet is applied on this XML document:
acme.xml:
<t>
<meta name="citation_publisher" content="ACME publisher"/>
<x>6</x>
<y>4</y>
</t>
the wanted, correct result (x*y) is produced:
24
When the same transformation is applied on this XML document:
beta.xml:
<t>
<meta name="citation_publisher" content="BETA publisher"/>
<x>6</x>
<y>4</y>
</t>
again the correct result (x+y) is produced:
10
Finally, when the same transformation is applied on this XML document:
other.xml:
<t>
<meta name="citation_publisher" content="OTHER publisher"/>
<x>6</x>
<y>4</y>
</t>
the result of the transformation is the wanted termination with error message:
Error: Unknown content source
Processing terminated by xsl:message at line 5

Related

Is it possible to store the parser error message in variable using xslt2.0 or xslt 3.0

I am doing transform xml file using xslt and I want to display the error message of xslt parser in an element
Note: Error message should be original of parser message
I am not sure there is a way to capture XML parsing errors of the primary input document to an apply-templates based XSLT 3 transformation but in general XSLT 3 with xsl:try/xsl:catch allows you to capture and handle run-time errors, so assuming you can organize the rest of your code (for instance by using a named template as the starting point) to load/parse any XML documents with the doc or document function then you can use try/catch to handle parsing errors. An example is https://xsltfiddle.liberty-development.net/ej9EGcg/2
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:err="http://www.w3.org/2005/xqt-errors"
exclude-result-prefixes="#all"
version="3.0">
<xsl:template match="/">
<root>
<xsl:try>
<xsl:variable name="doc1" select="doc('https://raw.githubusercontent.com/martin-honnen/martin-honnen.github.io/master/xslt/2019/test2019032601.xml')"/>
<xsl:value-of select="count($doc1//item)"/>
<xsl:catch>Error code: <xsl:value-of select="$err:code"/>
Reason: <xsl:value-of select="$err:description"/>
</xsl:catch>
</xsl:try>
</root>
</xsl:template>
</xsl:stylesheet>
which, depending on your needs, can also be reduced to directly use the relevant XPath expression with the select attribute of the xsl:try element e.g. https://xsltfiddle.liberty-development.net/ej9EGcg/3
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:err="http://www.w3.org/2005/xqt-errors"
exclude-result-prefixes="#all"
version="3.0">
<xsl:template match="/">
<root>
<xsl:try select="count(doc('https://raw.githubusercontent.com/martin-honnen/martin-honnen.github.io/master/xslt/2019/test2019032601.xml'))">
<xsl:catch>Error code: <xsl:value-of select="$err:code"/>
Reason: <xsl:value-of select="$err:description"/>
</xsl:catch>
</xsl:try>
</root>
</xsl:template>
</xsl:stylesheet>

XSLT: How to parse HTML embedded in XML tags

I came across situation where XML tag has HTML code that needs to be parsed in XSLT. Here is the XML sample:
<note>
<text><p>This is a paragraph.</p><p>This is another paragraph.</p></text>
</note>
I want the embedded paragraph elements to be stored in different variables.
This is a paragraph. should be stored in one variable and This is another paragraph. should be stored in another variable.
Can you please help?
Parsing XML documents with parse-xml https://www.w3.org/TR/xpath-functions/#func-parse-xml or XML fragments with parse-xml-fragment https://www.w3.org/TR/xpath-functions/#func-parse-xml-fragment is supported in XSLT 3.0, in earlier versions you would have to rely on processor specific extension provided or implementable.
Your escaped code looks like an XHTML fragment so it should be parseable with parse-xml-fragment as in https://xsltfiddle.liberty-development.net/bFDb2CQ which does
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:output method="html" indent="yes" html-version="5"/>
<xsl:template match="text">
<div>
<xsl:variable name="contents" select="parse-xml-fragment(.)"/>
<xsl:variable name="p1" select="$contents/p[1]"/>
<xsl:variable name="p2" select="$contents/p[2]"/>
<xsl:sequence select="$p1, $p2"/>
</div>
</xsl:template>
</xsl:stylesheet>

How to get rid of xmlns: - Attributes in XSL transformation

I have an xsl transformation to generate ASP.NET User controls (ascx).
My XSL is defined this way:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:asp="System.Web.UI.WebControls"
exclude-result-prefixes="asp msxsl"
>
<xsl:output method="xml" indent="no" omit-xml-declaration="yes" />
So from that exclude-result-prefixes I would assume, that everything with the asp prefix should not add the namespace information, but i.e. this template here:
<xsl:template match="Label">
<asp:Label runat="server" AssociatedControlID="{../#id}">
<xsl:copy-of select="./text()"/>
</asp:Label>
</xsl:template>
fed with this xml:
<Label>Label Text</Label>
results in this output:
<asp:Label runat="server" AssociatedControlID="SomeName" xmlns:asp="System.Web.UI.WebControls">Label Text</asp:Label>
So what do I need to do to prevent the xmlns:asp=".." to show up in every single tag in my result?
It is impossible, at least in MSXML, that is because output XML won't be well-formed. You can only output it like text, e.g. using CDATA.

xsltproc doesn't select elements by name

I am trying to transform XHTML using an XSLT stylesheet, but I can't even get a basic stylesheet to match anything. I'm sure I'm missing something simple.
Here's my XHTML source document (no big surprises):
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Windows (vers 25 March 2009), see www.w3.org" />
...
</body>
</html>
The actual contents don't matter too much, as I'll demonstrate below. By the way, I'm pretty sure the document is well-formed since it was created via tidy -asxml.
My more complex XPath expressions were not returning any results, so as a sanity test, I'm trying to transform it very simply using the following stylesheet:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
<xsl:text>---[</xsl:text>
<xsl:for-each select="html">
<xsl:text>Found HTML element.</xsl:text>
</xsl:for-each>
<xsl:text>]---</xsl:text>
</xsl:template>
</xsl:stylesheet>
The transform is done via xsltproc --nonet stylesheet.xsl input.html, and the output is: "---[]---" (i.e., it didn't find a child element of html). However, if I change the for-each section to:
<xsl:for-each select="*">
<xsl:value-of select="name()"/>
</xsl:for-each>
Then I get "---[html]---". And similarly, if I use for-each select="*/*" I get "---[headbody]---" as I would expect.
Why can it find the child element via * (with name() giving the correct name) but it won't find it using the element name directly?
The html element in your source XML defines a namespace. You have to include it in your match expression and reference it in your xsl:stylesheet element:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:html="http://www.w3.org/1999/xhtml">
<xsl:output method="text" omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
<xsl:text>---[</xsl:text>
<xsl:for-each select="html:html">
<xsl:text>Found HTML element.</xsl:text>
</xsl:for-each>
<xsl:text>]---</xsl:text>
</xsl:template>
</xsl:stylesheet>
Change your stylesheet from:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
<xsl:text>---[</xsl:text>
<xsl:for-each select="html">
<xsl:text>Found HTML element.</xsl:text>
</xsl:for-each>
<xsl:text>]---</xsl:text>
</xsl:template>
</xsl:stylesheet>
to:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:x="http://www.w3.org/1999/xhtml"
>
<xsl:output method="text" omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
<xsl:text>---[</xsl:text>
<xsl:for-each select="x:html">
<xsl:text>Found HTML element.</xsl:text>
</xsl:for-each>
<xsl:text>]---</xsl:text>
</xsl:template>
</xsl:stylesheet>
Explanation:
The XML document has declared a default namespace: "http://www.w3.org/1999/xhtml", and all unprefixed nodes that descend from the top element declaring this default namespace, belong to this namespace.
On the other side, in XPath any unprefixed name is considered to belong in "no namespace".
Therefore, the <xsl:for-each select="html"> instruction will select and apply its body to all html elements that belong to "no namespace" -- and there are none such in the document -- the only html element does belong to the xhtml namespace.
Solution:
The the names that belong to a default namespace cannot be referenced unprefixed. Therefore, we need to bind a prefix to the namespace such an element belongs to. If this prefix is "x:", then we can reference any such element prefixed with "x:".
A workaround without declaring the namespace, so that the stylesheet accept any namespace:
<xsl:template match="*[name()='html']" >

Processing an XML file with public doctype

I'm trying to process an SVG file with XSLT. I am having behaviors I don't understand, that involves the doctype declaration.
Here are two tests I've done. The first one gives me the expected result and the second gives me a result I don't understand. (tested with saxon and xalan).
Stylesheet used for the two tests :
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="text()" >
</xsl:template>
<xsl:template match="/">
<xsl:text>/</xsl:text>
<xsl:apply-templates />
</xsl:template>
<xsl:template match="svg">
<xsl:text>svg</xsl:text>
<xsl:apply-templates />
</xsl:template>
</xsl:stylesheet>
Test n°1
source file :
<?xml version="1.0"?>
<svg width="768" height="430">
</svg>
result :
/svg
Test n°2
source file :
<?xml version="1.0"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20001102//EN"
"http://www.w3.org/TR/2000/CR-SVG-20001102/DTD/svg-20001102.dtd">
<svg width="768" height="430">
</svg>
result :
/
Why does the doctype declaration modifies the behavior of the processing ?
The SVG elements are in the SVG namespace.
The DTD defines this, so:
<xsl:template match="svg">
is matching an element with the name of svg, but in no namespace. All the elements in the XML document are in the SVG namespace and this template doesn't match any node.
This explains the output.
Solution: Replace the template matching svg with one that matches svg in the SVG namespace, as in the following transformation:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:s="http://www.w3.org/2000/svg"
>
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="text()" >
</xsl:template>
<xsl:template match="/">
<xsl:text>/</xsl:text>
<xsl:apply-templates />
</xsl:template>
<xsl:template match="s:svg">
<xsl:text >svg</xsl:text>
<xsl:apply-templates />
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20001102//EN"
"http://www.w3.org/TR/2000/CR-SVG-20001102/DTD/svg-20001102.dtd">
<svg width="768" height="430" >
</svg>
the wanted result is produced:
/svg
Update:
Several people asked me "How a DTD can set a (default) namespace?"
Here is an answer: XML and DTDs with it were made a W3C Recommendation before namespaces made it. In pre-namespace XML a namespace declaration is simply an attribute.
DTD's can specify "default attributes" -- attributes, which may be ommitted from an instance but will be automatically added with a default value.
So, one way to define a default namespace in a DTD is to define an xmlns default attribute for the top element of the document.