XSL disable-output-escaping XML SPY vs SAXON - xslt

I need help with my XSLT.
I have an XML with encoded HTML tags with a tag:
Using XmlSpy (Altova) this DOES work:
'<xsl:value-of select="de" disable-output-escaping="yes"/>'
which returns html tags within the data tag.
But executing this XSL on SAXON does not work. The XSL is executed and returns output, but the output-escaping seems to be ignored.
Any ideas?

The key thing to remember is that disable-output-escaping is an instruction to the serializer, and it has no effect unless the XSLT processor is serializing the output. The most common reason for it "not working" is that the transformation output is going to a destination other than the serializer (for example, a DOM tree). So we need to know how you are running the transformation.
Also related to this, there have been changes to the spec regarding what happens if you use disable-output-escaping while writing to a temporary tree (that is, to a variable).
Processors are allowed to ignore disable-output-escaping entirely, but Saxon doesn't do that, except of course when the output isn't serialized. (That's because "escaping" is a serialization thing, and if you're not serializing, then you're not escaping anything, so there is nothing to disable).

Related

Eliminate javascript from HTML with XSLT

I am trying to transform an HTML report into XML, but some javascript in the file is throwing errors, due to statements with a less-than character (e.g., for(var i=0; i<els.length;i++) ). I thought I could eliminate the javascript with the following template, which should remove entire script nodes:
<xsl:template match="script"/>
I assumed the XSLT processor would simply skip over the entire script nodes, but it's still throwing the same errors. I also tried adding this one:
<xsl:template match="script/text()"/>
No luck. If I manually remove all the javascript from the file, my transform works, but that's not practical as I need to create and run a daily automated process on these HTML files to extract some data in the HTML tables.
As a general rule, XSLT will only process well-formed XML input: it's not designed to process other formats like HTML.
However, XSLT will generally accept input from a parser that delivers a stream of events that looks sufficiently like an XML stream. This allows parsers like TagSoup and validator.nu to be used as a front-end to your XSLT processor.
Saxon packages this up with a parse-html() function that invokes TagSoup to parse HTML input and turn it into a DOM-like tree (actually an XDM tree) that it can process as if it came from XML.
validator.nu is a more up-to-date HTML parser than TagSoup, but you would have to do a little more work to integrate that.
Question was answered by Martin Honnen in the comments:
oxygenxml.com/doc/versions/18.1/ug-editor/tasks/… suggests there is an HTML import feature so try whether that helps. Of course there are standalone applications like HTML Tidy I think you can use outside of the XSLT processsing to first convert your HTML to XHTML.

I want to try a tag not closed with xpath

I want to try a tag not closed with xpath like this:
<figure class="img"><img class="immagine-in-linea-senza-cornice" width="16%" src="images/schema_1_fmt.jpeg" alt=""/>
I want to close the tag with a xslt transformation.
XPath does not work directly on the input document, but on an abstract, tree-like representation of the document (e.g. XDM or DOM). In this model, opening and closing tags of an element are not represented at all. Instead, an element appears as a single entity in the tree.
Therefore, manipulating < and /> is completely out of the question for languages like XPath, because the concept of opening and closing tags is simply not implemented. I would argue that this abstraction is an advantage of the models, though.
Also, XSLT transformations normally take as input XML documents. If your document has unclosed elements, it will be rejected by any application that is only prepared to process XML.
In short, fix the XML document with a language other than the combination of XSLT and XPath (see e.g. here), and think about XSLT as soon as you have well-formed XML as input.

what to do when XSL transformation namespace page is offline?

XSL requires this at the top of every stylesheet:
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
and throws an error if the url in the namespace is not exactly right.
Today, "http://www.w3.org/1999/XSL/Transform" is offline. I cannot run any transformations. The transformation hangs and then returns "unexpected end of file" when the net request times out. If I change the URL in the namespace declaration to a random URL, the transformation fails with an error telling me that "http://www.w3.org/1999/XSL/Transform" is the required xsl namespace.
So how do I work around W3's site being down?
Using xmlns:something="..." declares an XML namespace. Such a namespace is merely a string, something that will help to attribute a unique meaning to element names like template or href, making sure multiple XML-based languages can be used in a single document without creating confusion as to its meaning.
Some of those namespaces are reserved for use by the W3C. The XSLT namespace is one of those. A proper XSLT processor will check if a stylesheet declares the correct namespace to make sure there can be no incorrect interpretation. The root element of the stylesheet should be in that XSLT namespace.
For an actual namespace value, you'd usually have a URI (and most often a URL) since that's normally a good unique identifier. However, this should never be used to actually resolve to any online resources during XML processing. Whereas HTTP URLs are normally treated in a case-insensitive manner and may make use of URL encoding for characters (e.g. space becomes %20), such resolution or equality of URLs is not checked in XML namespace processing. A namespace in XML is nothing but a string that's always checked in its exact form, casing and everything.
So if an XSLT processor complains that some resource at a URL cannot be found, then either it's doing something it shouldn't do, or the problem has nothing to do with namespace processing.
You're using Saxon, which most definitely isn't a processor that doesn't understand the concept of a namespace. Its father is Michael Kay who is also responsible for the XSLT 2.0 spec. But Saxon does support schema-aware XSLT processing. If a document specifies a schema location, then a processor using this for validation would actually use that location to get the schema. That's the difference with a namespace. DTDs and XML Schema locations can definitely result in network activity.
So I advise you to check if...
the XML uses a DTD with external definitions and whether those are available;
the XML specifies a schema location and whether that location can be reached;
the stylesheet makes use of a schema or some other external resource and whether that's available.
Once you've found the cause, look into the use of XML catalogs in conjunction with the processor. An XML catalog will allow you to use local resources if they can't be resolved from their URIs.
Simple answer: The http://www.w3.org/1999/XSL/Transform isn't a URL, it's just a string. If W3C had decided, there's no reason it couldn't have been 'ThisIsAnXsltStylesheet'. By convention, they usually resemble URL's, but this isn't required.
So, the fact that there's nothing at that URL isn't relevant to why your stylesheet is failing, and certainly won't be the cause. Logically speaking, if that were the case, then nobody without an internet connection would ever be able to use XSLT, and w3c's servers would be seriously overworked.
I'd recommend adding the first few lines of your XSLT into your question; it might shed some light on where your problem really is.

preproccesing in XSLT

is it at all possible to 'pre-proccess' in XSLT?
with preprocessing i mean updating the (in memory representation) of the source tree.
is this possible, or do i need to do multiple transforms for it.
use case:
we have Docbook reference manuals for out clients but for certain clients these need different 'skins' (different images etc). so what i was hoping to do is transform the image fileref path depending on a parameter. then apply the rest of the normal Docbook XSL templates.
Expanding on Eamon's answer...
In the case of either XSLT 1.0 or 2.0, you'd start by putting the intermediate (pre-processed) result in an <xsl:variable> element, declared either globally (top-level) or locally (inside a template).
<xsl:variable name="intermediate-result">
<!-- code to create pre-processed result, e.g.: -->
<xsl:apply-templates mode="pre-process"/>
</xsl:variable>
In XSLT 2.0, the value of the $intermediate-result variable is a node sequence consisting of one document node (was called "root node" in XSLT/XPath 1.0). You can access and use it just as you would any other variable, e.g., select="$intermediate-result/doc"
But in XSLT 1.0, the value of the $intermediate-result variable is not a first-class node-set. Instead, it's something called a "result tree fragment". It behaves like a node-set containing one root node, but you're restricted in how you can use it. You can copy it and get its string-value, but you can't drill down using XPath, as in select="$intermediate-result/doc". To do that, you must first convert it to a first-class node-set using your processor's node-set() extension function. In Saxon 6.5, libxslt, and 4xslt, you can use exsl:node-set() (as in Eamon's answer). In MSXML, you'd need to use msxsl:node-set(), where xmlns:msxsl="urn:schemas-microsoft-com:xslt", and in Xalan, I believe it's called xalan:nodeset() (without the hyphen, but you'll have to Google for the namespace URI). For example: select="exsl:node-set($intermediate-result)/doc"
XSLT 2.0 simply abolished the result tree fragment, making node-set() unnecessary.
This is not possible with standards compliant XSLT 1.0. It is possible in every actual implementation I've used, however. The extensions with which to do that differ by engine, however. It is also possible in standard XSLT 2.0 (which is in any case much easier to work with - so if you can, just use that).
If your xslt processor supports EXSLT, the exsl:node-set() function does what you're looking for. msxml has an identically named extension function as well (but with a different namespace uri, the functions are unfortunately not trivially compatible).
Since you are trying to generate slightly different output from the same DocBook XML source, you might want to look into the "profiling" (conditional markup) support in DocBook XSL stylesheets. See Chapter 26 in DocBook XSL: The Complete Guide by Bob Stayton:
Profiling is the term used in DocBook
to describe conditional text.
Conditional text means you can create
a single XML document with some
elements marked as conditional. When
you process such a document, you can
specify which conditions apply for
that version of output, and the
stylesheet will include or exclude the
marked text to satisfy the conditions.
This feature is useful when you need
to produce more than one version of a
document, and the versions differ in
minor ways.
For example, to use different images for, say, Windows and Mac versions of the same document, you might have a DocBook XML fragment like this:
<figure>
<title>The Foo dialog</title>
<mediaobject>
<imageobject os="windows">
<imagedata fileref="screenshots/windows/foo.png"/>
</imageobject>
<imageobject os="mac">
<imagedata fileref="screenshots/mac/foo.png"/>
</imageobject>
</mediaobject>
</figure>
Then, you would use the profiling-enabled versions of the DocBook XSL stylesheets with the profile.os parameter set to windows or mac.
Maybe you should use XSLT "OOP" methods here. Put all the common templates to all clients in a stylesheet, and create an stylesheet for each client with specific templates overriding common ones. Import the common stylesheet within the specific ones with xsl:import, and you'll do only one processing by calling the stylesheet corresponding to a client.

XSLT - how to deal with <![CDATA[

Im trying to output some XML using XSLT, however I've just come across this:
<description><![CDATA[<p>Using Money – recognise coins, getting change, paper money etc. A PowerPoint resource containing colour coded levels to suit different abilities – special needs. Self checking and interactive.</p>]]></description>
How do I output the actual HTML, not the <P>, but as if it was HTML?
You can use disable-output-escaping. Beware, though, that if the input value is not well-formed or valid, the output won't be either.
<xsl:value-of select="description" disable-output-escaping="yes"/>
XSLT handles data already parsed by the XML parser. The CDATA tags are parsed as text by the XML parser. You might need to do some pre-processing to remove the CDATA tags before turning over the XML to XSLT.