Upgrading XSLT 1.0 to XSLT 2.0 - xslt

What is involved in upgrading from XSLT 1.0 to 2.0?
1 - What are the possible reasons for upgrading?
2 - What are the possible reasons for NOT upgrading?
3 - And finally, what are the steps to upgrading?
I'm hoping for an executive summary--the short version :)

What is involved in upgrading from XSLT 1.0 to 2.0?
1 - What are the possible reasons for upgrading?
If you are an XSLT programmer you'll benefit largely from the more convenient and expressive XSLT 2.0 language + XPath 2.0 and the new XDM (XPath Data Model).
You may want to watch this XSLT 2.0 Pluralsight course to get firm and systematic understanding of the power of XSLT 2.0.
You have:
Strong typing and all XSD types available.
The ability to define your own (schema) types.
the XPath 2.0 sequence type that doesn't have any counterpart (simply is missing) in XPath 1.0.
The ability to define and write functions in pure XSLT -- the xsl:function instruction.
Range variables in XPath expressions (the for clause).
Much better and more powerful string processing -- XPath 2.0 supports regular expressions in its tokenize(), matches() and replace() functions.
Much better and more powerful string processing -- XSLT 2.0 support for regular expressions -- the xsl:analyze-string, xsl:matching-substring and xsl:non-matching-substring new XSLT instructions.
More convenient, powerful and expressive grouping: the xsl:for-each-group instruction.
A lot of new, very powerful XPath 2.0 functions -- such as the functions on date, time and duration, just to name a few.
The new XPath operators intersect, except, is, >>, <<, some, every, instance of, castable as, ..., etc.
The general XPath operators >, <, etc. now work on any ordered value type (not only on numbers as in XPath 1.0).
New, safer value comparison operators: lt, le, eq, gt, ge, ne.
The XPath 2.0 to operator, allowing to have xsl:for-each select="1 to $N"
These, and many other improvements/new features significantly increase the productivity of any XSLT programmer, which allows XSLT 2.0 development to be finished in a small fraction of the time necessary for developing the same modules with XSLT 1.0.
Strong typing allows many errors to be caught at compile time and to be corrected immediately. For me this strong type-safety is the biggest advantage of using XSLT 2.0.
2 - What are the possible reasons for NOT upgrading?
It is often possible, reasonable and cost-efficient to leave existing, legacy XSLT 1.0 applications untouched and to continue using them with XSLT 1.0, while at the same time developing only new applications using XSLT 2.0.
Your management + any other non-technical reasons.
Having a lot of legacy XSLT 1.0 applications written in a poor style (e.g. using DOE or extension functions that now need to be re-written and the code refactored).
Not having available an XSLT 2.0 processor.
3 - And finally, what are the steps to upgrading?
Change the version attribute of the xsl:stylesheet or xsl:transform element from "1.0" to "2.0".
Remove any xxx:node-set() functions.
Remove any DOE.
Be ready for the surprise that xsl:value-of now outputs not just the first, but all items of a sequence.
Try to use the new xsl:sequence instruction as much as possible -- use it to replace any xsl:copy-of instructions; use it instead of xsl:value-of any time when the type of the output isn't string or text node.
Test extensively.
When the testing has verified that the code works as expected, start refactoring (if deemed necessary). It is a good idea to declare types for any variables, parameters, templates and functions. Doing so may reveal new, hidden errors and fixing them increases the quality of your code.
Optionally, decide which named templates to rewrite as xsl:function.
Decide if you still need some extension functions that are used in the old version, or you can rewrite them easily using the new, powerful capabilities of XSLT.
Final remarks: Not all of the above steps are necessary and one can stop and declare the migration successful on zero bug testing results. It is much cleaner to start using all XSLT 2.0/XPath 2.0 features in new projects.

Dimitre's answer is very comprehensive and 100% accurate (as always) but there is one point I would add. When upgrading to a 2.0 processor, you have a choice of leaving the version attribute set to "1.0" and running in "backwards compatibility mode", or changing the version attribute to "2.0". People often ask which approach is recommended.
My advice is, if you have a good set of tests for your stylesheets, take the plunge: set version="2.0", run the tests, and if there are any problems, fix them. Usually the problems will be code that was never quite right in the first place and only worked by accident. But if you don't have a good set of tests and are concerned about the reliability of your workload, then leaving version="1.0" is a lower-risk approach: the processor will then emulate all the quirks of XSLT 1.0, such as xsl:value-of ignoring all but the first item, and the strange rules for comparing numbers with strings.

Related

fastest xslt 1.0, and xslt 2.0 processors in terms of performance

which xslt processor you would suggest, for xslt 1.0 and xslt 2.0 respectively, in terms of performance? Say one has a huge xml file, and would like to make the conversion time faster. Does the language it is implemented play a very important role?
For instance, i would like to try a go language implementation, if one has to suggest such an implementation, or a c language processor?
To be clear. I have made things in both, xslt 1.0 and xslt 2.0, so i need a way to make the process faster on each specification.
The implementation language for the processor isn't a very important factor. Sure, different languages pose different challenges: if you write an XSLT processor in C (or any other language without garbage collection) then you have to put a lot of effort into memory management, but that affects the cost and effort of writing the processor more than the performance it ends up achieving.
Performance isn't one dimensional. Different processors will compare differently depending on the actual stylesheet. In addition, it depends what you measure: processor start-up time, stylesheet compile time, source document parsing/building time, single-threaded performance or multi-threaded performance. There's no substitute for making your own measurements tailored to your particular workload.
If the performance of your workload isn't meeting requirements, then seeing how it fares on a different XSLT processor is one of the actions you can take to try and solve the problem. But there are many other actions you can take as well, and some of them may be less disruptive. Before trying a different XSLT processor, check to make sure that XSLT processing is actually a critical component of your overall system performance: I've seen plenty of cases were I've been called in to look at perceived XSLT performance issues and it turned out on investigation that the issues were nothing to do with XSLT processing. And where it is XSLT processing, sometimes a simple change to your XSLT code, or to the way you run the XSLT processor, will make a vast saving.

XPath reference

I don't know the proper terminology for what I'm looking for, but what I am looking for is a complete reference the statements that can go between the double quotes, things like *, node(), #*, and all the ones listed here plus any others that exist.
<xsl:template match="*">
The answer I linked to provides some detail, but not enough. For instance, that answer says "can be applied to any element" about the example I gave above, but what is considered an "element" in Xpath? What does node() include? What statements include attributes? etc.
I have searched the references here and here and I'm slowly making my way through this book, but I'm not seeing the info I want, which is basically a consolidated (and hopefully exhaustive) list of statements and what they mean. Does such a list exist and if so where is it? Free is nice but not necessary.
In XSLT, the match pattern accepts a subset of XPath expressions. So the set of expressions which can appear as the value of a match attribute is governed by two specs: the XPath specification itself, which defines the language of which match-patterns are a subset, and the XSLT specification, which defines the subset.
If you are working with XSLT 1.0, the authoritative account is given by the XPath 1.0 specification and the XSLT 1.0 specification. It is in the nature of XPath that the language is infinite in size; there cannot be an exhaustive list of legal patterns. Instead, the set of legal patterns is defined by a context-free grammar given in the XSLT and XPath specs.
If you are working with XSLT 2.0, the relevant specs are XPath 2.0 (Second Edition) and XSLT 2.0. Again, the definition of legal match patterns uses a grammar defined partly in the XSLT spec and partly in the XPath spec.
You ask what is considered an "element" in Xpath? What does node() include? What statements include attributes? etc.
Both versions of XPath define how to evaluate expressions against instances of the XPath data model; it is the data model that specifies that all element nodes are nodes, but not all nodes are element nodes (and so on). The data model for XPath 1.0 is simpler and in general easier to understand, but its definition is rather informal and has what some readers regard as some problematic gaps and contradictions; it is defined in section 5 of the XPath 1.0 specification. The XPath 2.0 data model is used not only by XPath and XSLT but also by XQuery; it is defined in a spec called unsurprisingly XQuery 1.0 and XPath 2.0 Data Model (XDM).
A good book on XSLT will typically also provide a good account of the data model; depending on the style of the book, of course, it may be more or less exhaustive and be more or less careful about corner cases. There are several good books, and I have heard people say good things about Doug Tidwell's book. But the one XSLT book I have found on every serious XSLT programmer's shelf is the one written by Michael Kay. (Actually, most serious XSLT programmers I know own two: the XSLT 1.0 version and the XSLT 2.0 version.)
From the wording of your question, it sounds as if you may also want to read some systematic introductions to XML itself.
I'm reminded a little of the manager who asks you for plans to rid the world of cancer, and insists they must be presented tomorrow on a single sheet of A4 paper. You've discovered that you need more technical detail than the simpler "single-sheet" references provide, but you still value their simplicity: you are asking for completeness and brevity at the same time, and that's a tough order.
I think you're actually well on the way to answering your own question. You're discovered that you need a better understanding of the data model, as this underpins the semantics of all the XPath expressions and XSLT patterns that you need to write. As Michael Sperberg-McQueen points out, there's an admirably concise but lamentably informal description of the model in the XPath 1.0 specification, and an admirably detailed but lamentably verbose description in the XDM spec linked from XSLT 2.0 and XQuery 1.0. Equally, you've also discovered that any short reference to the XPath (or pattern) grammar and semantics is going to be incomplete, but any longer description is going to take time to absorb. So you know the choices you have to make!
An element is an XML structure like <Something>, an attribute looks like Something="value" and both can be referred as node.
I think a good reference is XPath specification itself. Takes a while to read it all and some more to understand it, but it's a nice place to pickup some terminology to formulate more specific questions.

Xpath expression VS XSL variable

Does the performance of XSLT improve when xsl variable is used instead of XPath expression?
Update: I'm using Xalan for processing.
Does the performance of XSLT improve when xsl variable is used instead
of XPath expression?
This depends on the XSLT processor being used. If the XSLT processor has a good optimizer, in many cases it does the factorization by itself and there is no real speed gain doing this by hand.
However:
"Saving" the result of evaluation in a variable can make the code shorter and more readable.
This is a good application of the DRY (Don't Repeat Yourself) best practices.
Relying on the Optimizer not always works.
We shouldn't rely on optimizers when writing portable code, that is intended to be executed by more than one different XSLT processors -- such as when writing a library of functions/templates.
With some XSLT 2.0 processors, such as Saxon, one can even have xsl:function execution optimized, by turning on function memoization. In the case of Saxon this is done by setting the extension attribute saxon:memo-function to "yes".
It is my experience that it does but more important it improves the readability of the code. It also make code reuse simpler.

XSLT: Is there a way to "inherit" canned functionality?

i am once again having to cobble together a bit of XSLT into order to turn generated XML into (rather than simply generating HTML).
i'm having huge deja-vu this time again. i'm once again having to solve again basic problems, e.g.:
how to convert characters into valid html entity references
how to preserve whitespace/carriage returns when converting to html
how to convert to HTML as opposed to xhtml
how to convert dates from xml format into presentable format
how to tear apart strings with substring
This is all stuff that i've solved many times before. But every time i come back to XSLT i have to start from scratch, re-inventing the wheel every time.
If it were a programming language i would have a library of canned functions and procedures i can call. i would have subroutines to perform the commonly repeated tasks. i would inherit from a base class that already implements the ugly boilerplate stuff.
Is there any way in XSLT to grow, expand and improve the ecosystem with canned code?
This is all stuff that i've solved
many times before. But every time i
come back to XSLT i have to start from
scratch, re-inventing the wheel every
time.
This isn't necessary, of course.
If it were a programming language
Yes, XSLT is a programming language.
i would have a library of canned
functions and procedures i can call. i
would have subroutines to perform the
commonly repeated tasks.
Yes, you can do this in XSLT.
i would
inherit from a base class that already
implements the ugly boilerplate stuff.
Yes, there is something quite similar in XSLT.
Is there any way in XSLT to grow,
expand and improve the ecosystem with
canned code?
Even in XSLT 1.0 there are powerful, standard features that support reusability:
<xsl:import>
<xsl:include>
<xsl:apply-templates>
<xsl:call-template>
<xsl:apply-imports>
XSLT 2.0 adds a few even more powerful features:
<xsl:function>
Parameters for <xsl:apply-imports>
<xsl:next-match>
There have been several XSLT libraries for quite some time:
FXSL (1.x and 2.x) implements Higher-Order Functions in XSLT 1.0/2.0
FunctX -- a library of useful XSLT 2.0 and XQuery functions.
XPath 2.1 and XSLT 2.1 add Higher-Order Functions as standard. Functions become first-class datatypes.

Does LINQ to XML replace XSLT?

Is there anything you can do in XSLT that can't be done in LINQ to XML? Is it still important to learn XSLT? When would you choose one over the other?
Is there anything you can do in XSLT that can't be done in Linq to XML?
No, since LINQ to XML is an API used by Turing-complete programming languages, and covers more of XML Infoset than XSLT document model does (e.g. you can fully control the difference between text and CDATA nodes in L2X).
Is it still important to learn XSLT?
Depends on what you're doing. Broadly speaking, yes.
When would you choose one over the other?
XSLT is generally better when you need to do a transformation - i.e. both input and output is XML. There are a number of reasons for that. First of all, XSLT pattern matching is usually more concise than nested ?: in L2X queries, and far more readable. You can also use * to great effect to set up a default rule (like "copy everything", or "process children but do not generate output"), and then add rules for specific nodes you need to process in a special way - thus you do not need to write explicit loops/comprehensions for each node level in the document, as you often do in L2X. Finally, XPath is also more concise than L2X queries (at least in C#), so if you do a lot of non-trivial querying, it's likely to be far shorter and more readable in XSLT.
L2X is generally better when you need to quickly query a document for some value or node. The main advantage here is that there's less runtime overhead (XPath needs to be parsed, L2X query does not), and you don't need to mess with XmlNamespaceManager and other cruft - the API is streamlined for writing single-expression queries. As well, having nested from loops and let brings it closer to XQuery territory.
L2X is also the only choice when you need an in-place update of the document, and may be better when you only need to replace a few values in the document, and in-place update is an option - since XSLT doesn't let you touch the input in any way.
It is definitely still important to learn XSLT. LINQ to XML is great, but it's use is limited to .NET Apps.
XSLT can be applied across languages and platforms...even browsers can take XML and apply an XSLT to generate an output.
Don't forget that some .NET Application API's (CMS systems for example) still require you to supply XSLT to transform internal XML into an output. Ignoring the technology all together would be, in my opinion, a real mistake.
Not for anyone not using .NET