How to process HTML entities in XSLT - xslt

I am trying to transform XHTML that contains the entity. Saxon complains that the entity is not defined. How can I define it?
Is it possible to add the entity definition at the beginning of the stylesheet? As suggested
here:
http://s-n-ushakov.blogspot.com/2011/09/xslt-entities-java-xalan.html
or here:
Using an HTML entity in XSLT (e.g. )
My puny attempt, ignored by Saxon, was to add the following to the beginning of the XSLT:
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE stylesheet [
<!ENTITY nbsp " ">
]>
I am using Saxon 9.9 PE.
The HTML I am trying to transform is a complete document, not just a fragment.

One possibility is to pass the URL of the XHTML to the XSLT as a parameter, which would read the XHTML as text using the unparsed-text() function, expand the entity reference using the replace() function, and parse the result using the parse-xml() function. e.g.
<xsl:template name="xsl:initial-template">
<xsl:param name="source"/>
<xsl:apply-templates select="
$source
=> unparsed-text()
=> replace('&nbsp;', '&#x000A0;')
=> parse-xml()
"/>
</xsl:template>

If the input document contains an entity reference that isn't declared in the DOCTYPE declaration, then it isn't a well-formed XML document, and therefore it isn't a well-formed XHTML document; and if it isn't well-formed, then Saxon can't handle it.
It would be best to look at the processing workflow that generated this ill-formed document and fix it so the documents it produces are well-formed.
If you can't do that, then you might be able to parse it as HTML. Saxon has an extension function saxon:parse-html(); or if your application is in Java then you could create a SAXSource that uses validator.nu as its XMLReader.

You should consider using the tool Tidy and convert html files into xhtml. It corrects all such things.
Just run tidy with the argument -asxml.

Related

Passing a node as parameter to a XSL stylesheet

I need to pass a node as a parameter to an XSL stylesheet. The issue is that the parameter gets sent as a string. I have seen the several SO questions regarding this topic, and I know that the solution (in XSLT 1.0) is to use an external node-set() function to transform the string to a node set.
My issue is that I am using eXist DB I cannot seem to be able to get its XSLT processor to locate any such function. I have tried the EXSLT node-set() from the namespace http://exslt.org/common as well as both the Saxon and Xalan version (I think eXist used to use Xalan but now it might be Saxon).
Are these extensions even allowed in the XSLT processor used by eXist? If not, is there something else I can do?
To reference or transform documents from the database, you should pass the path as a parameter to the transformation, and then refer to it using a parameter and variable
(: xquery :)
let $path-to-document := "/db/test/testa.xml"
let $stylesheet :=
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="source" required="no"/>
<xsl:variable name="error"><error>doc not available</error></xsl:variable>
<xsl:variable name="theDoc" select="if (doc-available($source)) then doc($source) else $error"/>
<xsl:template match="/">
<result><xsl:value-of select="$source"/> - <xsl:value-of select="node-name($theDoc/*)"/></result>
</xsl:template>
</xsl:stylesheet>
return transform:transform(<dummy/>,$stylesheet, <parameters><param name="source" value="xmldb:exist://{$path-to-document}"/></parameters>)
As per Martin Honnen's comments I don't think it is possible to pass an XML node via the <parameters> structure of the transform:transform() function in eXist. The function seems to strip away any XML tags passed to it as a value.
As a workaround I will wrap both my input XML and my parameter XML into a root element and pass that as input to the transform function.

using xsl:param variable in xs:schema element output

I am trying to output the following line from an XSLT script. It is the first line just after xsl:template match="/". What I am trying to do is to transform XML document into an XML schema and need to output the xs:schema tag in particular way.
<xs:schema xmlns:ed="http://test1" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" targetNamespace="{$ns_name}" xmlns:tns="{$ns_name}" elementFormDefault="qualified" attributeFormDefault="unqualified" xsi:schemaLocation="http://test1 file://XmlSchemaAppinfo.xsd">
the $ns_name is a xsl:param name="ns_name". It is resolved in the targetNamespace="{$ns_name}" correctly but in the xmlns:tns="{$ns_name}" it is output literally
<xs:schema targetNamespace="akolodk" elementFormDefault="qualified" attributeFormDefault="unqualified" xsi:schemaLocation="http://test1 file://XmlSchemaAppinfo.xsd" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:ed="test1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tns="{$ns_name}">
Namespace declarations are not treated the same as attributes, even though they look the same. The xmlns:tns will have been processed by the XML parser when the stylesheet was parsed, before it gets to the XSLT processor.
If you have XSLT 2.0 you can use
<xsl:namespace name="tns" select="$ns_name"/>
to create a namespace node in the result tree but there's no easy way I know of to generate a dynamic namespace in XSLT 1.0. You can't use xsl:attribute, the spec explicitly states that while
<xsl:attribute name="xmlns:xsl" namespace="whatever">http://www.w3.org/1999/XSL/Transform</xsl:attribute>
is not an error, it will generate an attribute, not a namespace declaration - the processor is required to ignore the xmlns prefix specified in the name and must use a different prefix to output the attribute.
If your processor supports the exslt node-set extension function then the following might work:
<xsd:schema .....>
<xsl:variable name="tnsElement">
<xsl:element name="tns:dummy" namespace="{$ns_name}"/>
</xsl:variable>
<xsl:copy-of select="exsl:node-set($tnsElement)/*/namespace::tns"/>
but again the processor is allowed to ignore the prefix of the xsl:element name attribute and use a different prefix bound to the same URI, you'll have to test it with your processor.
(and you'll have to add xmlns:exsl="http://exslt.org/common" exclude-result-prefixes="exsl" to your xsl:stylesheet element).
In XSL only some attributes can be written using attribute value templates (using the '{}' notation). In particular, xmlns attributes do not support the notation.

XSLT transformation passing parameters

I am trying to pass parameters during an XSLT transformation. Here is the xsl stylesheet.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="param1" select="'defaultval1'" />
<xsl:param name="param2" select="'defaultval2'" />
<xsl:template match="/">
<xslttest>
<tagg param1="{$param1}"><xsl:value-of select="$param2" /></tagg>
</xslttest>
</xsl:template>
</xsl:stylesheet>
The following in the java code.
File xsltFile = new File("template.xsl");
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document stylesheet = builder.parse("template.xsl");
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer xsltTransformer = transformerFactory.newTransformer(new DOMSource(stylesheet));
//Transformer xsltTransformer = transformerFactory.newTransformer(new StreamSource(xsltFile));
xsltTransformer.setParameter("param1", "value1");
xsltTransformer.setParameter("param2", "value2");
StreamResult result = new StreamResult(System.out);
xsltTransformer.transform(new DOMSource(builder.newDocument()), result);
I get following errors:-
ERROR: 'Variable or parameter 'param1' is undefined.'
FATAL ERROR: 'Could not compile stylesheet'
However, if i use the following line to create the transformer everything works fine.
Transformer xsltTransformer = transformerFactory.newTransformer(new StreamSource(xsltFile));
Q1. I just wanted to know whats wrong in using a DOMSource in creating a Transformer.
Q2. Is this one of the ideal ways to substitute values for placeholders in an xml document? If my placeholders were in a source xml document is there any (straightforward) way to substitute them using style sheets (and passing parameters)?
Q1: This is a namespace awareness problem. You need to make the DocumentBuilderFactory namespace aware:
factory.setNamespaceAware(true);
Q2: There are several ways to get the values from an external xml file. One way to do this is with the document function and a top level variable in the document:
<!-- Loads a map relative to the template. -->
<xsl:variable name="map" select="document('map.xml')"/>
Then you can select the values out of the map. For instance, if map.xml was defined as:
<?xml version="1.0" encoding="UTF-8"?>
<mappings>
<mapping key="value1">value2</mapping>
</mappings>
You could remove the second parameter from your template, then look up the value using this line:
<tagg param1="{$param1}"><xsl:value-of select="$map/mappings/mapping[#key=$param1]"/></tagg>
Be aware that using relative document URIs will require that the stylesheet has a system id specified, so you will need to update the way you create your DOMSource:
DOMSource source = new DOMSource();
source.setNode(stylesheet);
source.setSystemId(xsltFile.toURL().toString());
In general, I suggest looking at all of the options that are available in Java's XML APIs. Assume that all of the features available are set wrong for what you are trying to do. I also suggest reading the XML Information Set. That specification will give you all of the definitions that the API authors are using.

Saxon Transformerfactory placing unneceary xmlns:xsd="http://www.w3.org/2001/XMLSchema" in transformer output

I am using Sax transformer factory to do an XSLT transformation on large set of xsd files, so a particular line the xslt is as follows.
<xsl:result-document href="{$fileName}"
doctype-public="-//OASIS//DTD DITA Reference//EN"
doctype-system="reference.dtd">
<reference id="{$guid}" xml:lang="EN-US" outputclass="landscape">
<title>
<xsl:value-of select="$typeName"/>
</title>
<abstract>....
the reference tag being the root of the document, but the result has an unwanted xmlns:xsd attribute shown below.
...<reference xmlns:xsd="http://www.w3.org/2001/XMLSchema"
id="RANDOM-ID".....
this additional attribute is causing problems with parser that uses the transformed xml.
is this an issue with the XSLT or with SAXON api, how can i avoid this?
By default the xsl transformation will copy namespaces that are defined in the stylesheet to the output document. You can exclude this namespace by specifying the exclude-result-prefixes on the xsl:stylesheet or the reference element with a value of "xsd".
Here is the relevant part of the xslt sepcification:
The created element node will also have a copy of the namespace nodes that were present on the element node in the stylesheet (...)
A namespace URI is designated as an excluded namespace by using an exclude-result-prefixes attribute on an xsl:stylesheet element or an xsl:exclude-result-prefixes attribute on a literal result element. The value of both these attributes is a whitespace-separated list of namespace prefixes.

XSLT functions and namespaces

I'm kind of new to XSLT, and I've gotten basic transformation done. Next I want to try out date manipulations, since my data will have timestamps. However, I can't seem to get any date functions to work, and it greatly frustrates me. I'm testing using Firefox 3.5, xsltproc 1.1.24, xalan 1.10, and XMLSpy 2009, and they all say that the functions I'm trying to use don't exist.
My xml looks like so:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="datetime.xsl"?>
<watcher>
<event id="1" date="2009-09-04T13:49:10-0500" type="ABCD">This is a test </event>
</watcher>
</code>
My xsl looks like so:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fn="http://www.w3.org/2005/02/xpath-functions"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:template match="event[#type='ABCD']">
<!-- Date: <xsl:value-of select="day-from-dateTime(xs:dateTime(#date))"/> -->
<!-- Date: <xsl:value-of select="day-from-dateTime(#date)"/> -->
Date: <xsl:value-of select="fn:day-from-dateTime(#date)"/>
</xsl:template>
</xsl:stylesheet>
If I make the stylesheet version 2, XMLSpy complains that it can't cast my date: XSLT 2.0 Debugging Error: Error in XPath 2.0 expression (Cast failed, invalid lexical value - xs:dateTime '2009-09-04T13:49:10-0500')
If I leave it as version 1, it complains about a different error: XSLT 1.0 Debugging Error: Error in XPath expression (Unknown function - Name and number of arguments do not match any function signature in the static context - 'day-from-dateTime')
Anytime I try to change the XSL to use a namespace, such as fn:day-from-dateTime, it refuses to work at all, with all of my parsers saying that The function number 'http://www.w3.org/2005/02/xpath-functions:day-from-dateTime' is not available and variants thereof. I know from other tests that I can use the substring() function perfectly, without needing any namespace prefix, and I believe it's in the same namespace as day-from-dateTime.
I feel like it's something incredibly easy, since all of the tutorials show functions being used, but something seems to be eluding me. Could someone show me what I'm missing?
Ouch, nasty versions thing going on here. A lot of the issues you're seeing will be because the XSLT processor you're using doesn't support XPath 2.0, which is where that day-from-dateTime function comes from.
I can get what you're trying to do to work, with a Saxon processor - Saxon-B 9.1.0.6 as my processor instead of Xalan. (Xalan appears to support XPath 1.0 only, according to the documentation)
There are a few errors in your documents:
The source document should have the timezone as 05:00, not 0500
<?xml version="1.0" encoding="UTF-8"?>
<watcher>
<event id="1" date="2009-09-04T13:49:10-05:00" type="ABCD">This is a test </event>
</watcher>
The XSLT should cast the string 2009-09-04T13:49:10-05:00 into a xs:dateTime, which is what type the argument of day-from-dateTime needs to be.
Date: <xsl:value-of select="day-from-dateTime(xs:dateTime(#date))"/>
And then it works
<?xml version="1.0" encoding="UTF-8"?>
Date: 4
Hope that helps,