what to do when XSL transformation namespace page is offline? - xslt

XSL requires this at the top of every stylesheet:
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
and throws an error if the url in the namespace is not exactly right.
Today, "http://www.w3.org/1999/XSL/Transform" is offline. I cannot run any transformations. The transformation hangs and then returns "unexpected end of file" when the net request times out. If I change the URL in the namespace declaration to a random URL, the transformation fails with an error telling me that "http://www.w3.org/1999/XSL/Transform" is the required xsl namespace.
So how do I work around W3's site being down?

Using xmlns:something="..." declares an XML namespace. Such a namespace is merely a string, something that will help to attribute a unique meaning to element names like template or href, making sure multiple XML-based languages can be used in a single document without creating confusion as to its meaning.
Some of those namespaces are reserved for use by the W3C. The XSLT namespace is one of those. A proper XSLT processor will check if a stylesheet declares the correct namespace to make sure there can be no incorrect interpretation. The root element of the stylesheet should be in that XSLT namespace.
For an actual namespace value, you'd usually have a URI (and most often a URL) since that's normally a good unique identifier. However, this should never be used to actually resolve to any online resources during XML processing. Whereas HTTP URLs are normally treated in a case-insensitive manner and may make use of URL encoding for characters (e.g. space becomes %20), such resolution or equality of URLs is not checked in XML namespace processing. A namespace in XML is nothing but a string that's always checked in its exact form, casing and everything.
So if an XSLT processor complains that some resource at a URL cannot be found, then either it's doing something it shouldn't do, or the problem has nothing to do with namespace processing.
You're using Saxon, which most definitely isn't a processor that doesn't understand the concept of a namespace. Its father is Michael Kay who is also responsible for the XSLT 2.0 spec. But Saxon does support schema-aware XSLT processing. If a document specifies a schema location, then a processor using this for validation would actually use that location to get the schema. That's the difference with a namespace. DTDs and XML Schema locations can definitely result in network activity.
So I advise you to check if...
the XML uses a DTD with external definitions and whether those are available;
the XML specifies a schema location and whether that location can be reached;
the stylesheet makes use of a schema or some other external resource and whether that's available.
Once you've found the cause, look into the use of XML catalogs in conjunction with the processor. An XML catalog will allow you to use local resources if they can't be resolved from their URIs.

Simple answer: The http://www.w3.org/1999/XSL/Transform isn't a URL, it's just a string. If W3C had decided, there's no reason it couldn't have been 'ThisIsAnXsltStylesheet'. By convention, they usually resemble URL's, but this isn't required.
So, the fact that there's nothing at that URL isn't relevant to why your stylesheet is failing, and certainly won't be the cause. Logically speaking, if that were the case, then nobody without an internet connection would ever be able to use XSLT, and w3c's servers would be seriously overworked.
I'd recommend adding the first few lines of your XSLT into your question; it might shed some light on where your problem really is.

Related

How to break caching on exist-db of included XSLs in Transform

I have a large set of XSLs that we recently went through and implemented a shared XSL template with common bits. We included an xsl:include in all the main XSLs now to pull these in. We had no issues at first until we started to make changes to the shared XSL.
For information, the whole system is web based, calling queries to dynamically format documents in the database given different XSLs through XSL FO and RenderX.
The main transform is:
let $fo := util:expand(transform:transform($articles, doc("/db/Customer/data/edit/xsl/Custbatch.xsl"), $parameters))
That XSL (Custbatch.xsl) has:
<xsl:include href="Custshared.v1.xsl"/>
If we make an edit to "Custshared.v1.xsl" is not reflected in the result because it is obvious that "Custshared.v1.xsl" is being cached and used. We know this because as you can see the name now includes "v1". If we make a change and change all the references say from v1 to v2, it all works. But this seems a bit ridiculous as that means we have to change the 18 XSLs that include this XSL or do something silly like restart the database.
So, what am I missing in the setup or controller.xql (which has the following on all not matched paths), to get things not to cache. I assume that is all internal so this setting likely does not matter. Is there some other setting in the config that does?
<dispatch xmlns="http://exist.sourceforge.net/NS/exist">
<cache-control cache="no"/>
</dispatch>
In reading the document here: http://exist-db.org/exist/apps/doc/xsl-transform.xml, it states:
"The stylesheet will be compiled into a template using the standard Java APIs (javax.xml.transform). The template is shared between all instances of the function and will only be reloaded if modified since its last invocation."
However, if I change an included XSL, it is not being used.
Update #1
I even went as far as creating a query that returns the XSL that is included, then I use:
<xsl:include href="http://localhost/get-include-xsl.xq"/>
This does work as formatting is not broken, but changing the underlying XSL yields the same result. So even that Xquery result is cached.
Update #2
And yes, through some simple test all is proven.
If I make any change to the root template (like add a meaningless space) and run, it does include the changes made in the include. If I only change the included XSL, no changes happen.
So lacking anything else, we could always write a Xquery that basically touches all the main templates after a change is made to the include template. Seems so wrong as a workaround.
Update #3
So the workaround we are currently using is that we have an unused "variable" in the XSL (version) and when we update the shared template, we execute that query which basically updates the value in that variable. At least it's only one XQuery and maybe we should attach to a trigger.
There is a setting in $exist-db-root$/conf.xml for the XSL transformer where you can turn off caching: <transformer class="net.sf.saxon.TransformerFactoryImpl" caching="no"> (The default is 'yes')

Ordering of namespaces

I've created an XSLT stylesheet document. Within this document I create a new XML document as stated below:
...
<CREATE_REQ
xsi:schemaLocation="http://fcubs.ofss.com/service/aServices theService.xsd"
xmlns="http://fcubs.ofss.com/service/aServices"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...
After transformation (see below) the ordering of the namespaces is different. A normal XML parser can handle this and it is normally no problem. The problem in my case is that the receiving application can't handle this and the order of the namespace may and shouldn't be changed.
<CREATE_REQ xmlns="http://fcubs.ofss.com/service/aServices"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://fcubs.ofss.com/service/aServices theService.xsd">
...
Is there a function or declaration that the namespaces will not be changed?
If the receiving application can't handle it then it needs to be fixed. Whoever wrote it doesn't seem to have grasped what XML is all about. Fix the receiving application, or throw it in the bin where it belongs.
After transformation (see below) the ordering of the namespaces is
different.
No, the ordering is exactly the same. According to the W3C XPath 1.0 Data Model:
The attribute nodes and namespace nodes of an element occur before the children of the element. The namespace nodes are defined to occur before the attribute nodes.
This means that although in the provided XML fragment the attribute xmlns:xsi seems to precede the namespace declarations, in fact it follows them.
Therefore, the produced output doesn't change the ordering of namespaces and attributes of the original XML document.
Producing an XML document where an attribute precedes a namespace node would violate the above quoted definition, therefore a compliant XSLT processor wouldn't produce such a document.

XSLT namespaces URL

What does a namespace do in XSLT when a url is provided such as:
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
does this attempt to make a connection to the internet?
No; it just so happens that the specification for XML Namespaces (see W3C XSL Namespace specifications) are URI's.
They work in exactly the same way that namespaces in other languages do; they help uniquely identify things with the same names but in different contexts.
You can prove that no attempt is made to retrieve the resource by using a HTTP Monitor on your machine while loading or using the XSL Transformation - this answer has many good suggestions.
No.
Whatever the namespace is in a xsd, an xslt or any other xml file, there is no internet request.
The namespace is used to qualified your xml element.
When you conduct an XSLT transformation, the XSLT engine validates the XSLT file. It performs many checks, such as the root element being named stylesheet, etc. The engine must also be able to discern literal result elements (like <table>) from XSLT-specific elements (like <xsl:stylesheet>).
An element is recognized as XSLT-specific when it resides in the XSLT namespace. The value of the URI you posted (http://www.w3.org/1999/XSL/Transform) is simply a convention that makes it clear we're talking about XSLT. The prefix being defined (xsl) is the prefix used in the XSLT file to qualify the XSLT elements. You can use another prefix if you choose, provided you map it to the XSLT namespace.
Note that it's actually just a URI (an identifier), not a URL (a locator). There is no HTTP request to locate anything, it just identifies an abstract concept (in this case "XSLT").

How to deal with presence or not of xml namespaces using xslt

I have some XML/TEI documents, and i'm writing an XSLT 2.0 to extract their content.
Almost all TEI documents has no namespace, but one has the default namespace (xmlns="http://www.tei-c.org/ns/1.0").
So all documents has the same aspect, with unqulified tags like <TEI> or <teiHeader>, but if I try to extract the content, all works with "non-namespaced-documents", but nothing (of course) is extracted from the namespaced-document.
So i used the attribute xpath-default-namespace="http://www.tei-c.org/ns/1.0" and now (of course) the only document working is the namespaced one.
I can't edit documents at all, so what I'm asking is if there's a way to change dynamically the xpath-default-namespace in order to make work xpaths like //teiHeader both with namespaced and non-namespaced documents
If you are using XSLT 2.0, then you do have the option for a wildcard match for the namespace in a node test.
e.g. //*:teiHeader
http://www.w3.org/TR/xpath20/#node-tests
A node test can also have the form
*:NCName. In this case, the node test is true for any node of the principal
node kind of the step axis whose local
name matches the given NCName,
regardless of its namespace or lack of
a namespace.
This is functionally equivalent to Dimitre Novatchev's example, but a little shorter/easier to type.
However, this will only work in XSLT/XPATH 2.0.
There isn't really a clean way to do precisely what you are asking. However, there are workarounds available. You could use a two stage process whereby you strip the namespace from the document if it's present and then pass it through the same templates for all content.
There is a good example (in XSLT 1) of doing this in the DocBook XSLT. Take a look at html/docbook.xsl and common/stripns.xsl
Basically, you would need to assign the result of stripping the namespace to a variable and then call your existing templates (for the non namespaced) content but select the variable.
It is ugly, but this gives you what you want:
//*[name()='teiHeader']
If you use this style for all location steps in any XPath expression, the XPath expressions will select elements only by name, regardless whether or not the elements belong to any namespace.

Using xpath/xslt to get the anchor part of the page url

I'm writing a template in xslt with xpath 1.0 and need to access the anchor tag from the url of the current page. For example, the url:
http://localhost/destinations/london.aspx#accommodation
I need to be able to get #accommodation and assign it to a variable. I realise I'm somewhat limited by using xpath 1.0 - has anyone got any experience doing this?
Thanks, Adam
Why is this an xpath problem at all? A URL is not an XML document, ergo xpath does not apply.
XSLT is completely unaware of any state like page location. Guessing a bit at what you're trying to do, you're probably best off getting #accomodation from string manipulation or framework in the layer which calls the XSLT, passing the value in as a param.
OTOH maybe this is nonsensical and your question just needs clarification.
As #Annakata said, this is not an XPath problem. It doesn't seem to be an XSLT problem either, though I may be mistaken. If it is related to XSLT string parsing, then what you need is something like this question talks about.
What you probably need instead is Javascript to get the current URL (document.location) and then perform Javascript string parsing on it.
There is no way in standard XSLT to access to URL of a document : http://www.dpawson.co.uk/xsl/sect2/nono.html#d1974e804
Some vendors might provide this information via custom properties, but then you would be dependant on the XSLT processor.
If you have managed to get the URL into the XSLT in some fashion, then I suggest you will have to resort to simple string manipulation to get the anchor.