Does the performance of XSLT improve when xsl variable is used instead of XPath expression?
Update: I'm using Xalan for processing.
Does the performance of XSLT improve when xsl variable is used instead
of XPath expression?
This depends on the XSLT processor being used. If the XSLT processor has a good optimizer, in many cases it does the factorization by itself and there is no real speed gain doing this by hand.
However:
"Saving" the result of evaluation in a variable can make the code shorter and more readable.
This is a good application of the DRY (Don't Repeat Yourself) best practices.
Relying on the Optimizer not always works.
We shouldn't rely on optimizers when writing portable code, that is intended to be executed by more than one different XSLT processors -- such as when writing a library of functions/templates.
With some XSLT 2.0 processors, such as Saxon, one can even have xsl:function execution optimized, by turning on function memoization. In the case of Saxon this is done by setting the extension attribute saxon:memo-function to "yes".
It is my experience that it does but more important it improves the readability of the code. It also make code reuse simpler.
Related
which xslt processor you would suggest, for xslt 1.0 and xslt 2.0 respectively, in terms of performance? Say one has a huge xml file, and would like to make the conversion time faster. Does the language it is implemented play a very important role?
For instance, i would like to try a go language implementation, if one has to suggest such an implementation, or a c language processor?
To be clear. I have made things in both, xslt 1.0 and xslt 2.0, so i need a way to make the process faster on each specification.
The implementation language for the processor isn't a very important factor. Sure, different languages pose different challenges: if you write an XSLT processor in C (or any other language without garbage collection) then you have to put a lot of effort into memory management, but that affects the cost and effort of writing the processor more than the performance it ends up achieving.
Performance isn't one dimensional. Different processors will compare differently depending on the actual stylesheet. In addition, it depends what you measure: processor start-up time, stylesheet compile time, source document parsing/building time, single-threaded performance or multi-threaded performance. There's no substitute for making your own measurements tailored to your particular workload.
If the performance of your workload isn't meeting requirements, then seeing how it fares on a different XSLT processor is one of the actions you can take to try and solve the problem. But there are many other actions you can take as well, and some of them may be less disruptive. Before trying a different XSLT processor, check to make sure that XSLT processing is actually a critical component of your overall system performance: I've seen plenty of cases were I've been called in to look at perceived XSLT performance issues and it turned out on investigation that the issues were nothing to do with XSLT processing. And where it is XSLT processing, sometimes a simple change to your XSLT code, or to the way you run the XSLT processor, will make a vast saving.
I have a XSLT that I'm executing via the xdmp:invoke() function and I'm running into very long processing times to see any result (in some instances timing out completely after max time out of 3600s is reached). This XSLT runs approximately in 5sec in Oxygen editor. Some areas I think maybe impacting performance:
The XSLT produces multiple output files, using xsl:result-document. The MarkLogic XSLT processor outputs these as result XML nodes, as it cannot physically save these documents to a file system.
The XSLT builds variables that contain xml nodes, which then are processed by other template calls. At times these variables can hold a large set of XML nodes.
I've done some profiling on the XSLT and it seem that building the variables seems to be the most time consuming part of the execution. I'm wondering why that's the case and why does it run a lot faster on the saxon processor?
Any insight is much appreciated.
My understanding is that there are some XSLT performance optimizations that are difficult or impossible to implement in the context of a database in comparison to a filesystem. Also, Saxon is the industry leader in XSLT and is significantly faster than almost anything on the market, although that probably doesn't account for the large discrepancy you describe.
You don't say which version of MarkLogic you're running, but version 8.0 has made significant improvements in XSLT performance. A few simple tests I ran suggested 3-4x speed improvement, depending on the XSLT.
I have run into some rare but serious performance edge cases for XSLT when running MarkLogic on Windows. Linux and OSX builds don't appear to have this problem. It is also far more highly pronounced when the XSLT tasks are running on multiple threads.
It is possible, however, to save data directly to the filesystem instead of the database using xdmp:save.
Unless your XSLTs involve very complex templating rules, I would recommend at least testing some of performance-sensitive XSLT logic in XQuery. It may be possible to port the slowest parts and pass the results of those queries to the XSLT. It's not ideal, but you might be able to achieve acceptable performance without rewriting the XSLTs.
Another idea, if the problem is simply the construction of variables in a multi-pass XSLT, is to break the XSLT into multiple XSLTs and make multiple calls to xdmp:xslt-invoke from XQuery. However, I know there is some overhead to making an xdmp:xslt-invoke call, so it may be a wash, or it may be worse.
I have come across similar performance issues with stylesheets in ML 7. To come to think of it I had similar stylesheets as the ones you have mentioned i.e. variables holding sequence of nodes. It seems xslt cannot be possibly optimised as well as xquery is. If you are not satisfied with the performance of your stylesheets I would recommend you to convert the xslt to it's equivalent xquery. I did this and achieved about 1~1.5 secs performance gains. It may be worth the effort :)
Well in my case, it seems that using the fn:not() function in template match rules is causing the slow performance. Perhaps if someone else is experiencing the same problem this might be a good starting point.
What is involved in upgrading from XSLT 1.0 to 2.0?
1 - What are the possible reasons for upgrading?
2 - What are the possible reasons for NOT upgrading?
3 - And finally, what are the steps to upgrading?
I'm hoping for an executive summary--the short version :)
What is involved in upgrading from XSLT 1.0 to 2.0?
1 - What are the possible reasons for upgrading?
If you are an XSLT programmer you'll benefit largely from the more convenient and expressive XSLT 2.0 language + XPath 2.0 and the new XDM (XPath Data Model).
You may want to watch this XSLT 2.0 Pluralsight course to get firm and systematic understanding of the power of XSLT 2.0.
You have:
Strong typing and all XSD types available.
The ability to define your own (schema) types.
the XPath 2.0 sequence type that doesn't have any counterpart (simply is missing) in XPath 1.0.
The ability to define and write functions in pure XSLT -- the xsl:function instruction.
Range variables in XPath expressions (the for clause).
Much better and more powerful string processing -- XPath 2.0 supports regular expressions in its tokenize(), matches() and replace() functions.
Much better and more powerful string processing -- XSLT 2.0 support for regular expressions -- the xsl:analyze-string, xsl:matching-substring and xsl:non-matching-substring new XSLT instructions.
More convenient, powerful and expressive grouping: the xsl:for-each-group instruction.
A lot of new, very powerful XPath 2.0 functions -- such as the functions on date, time and duration, just to name a few.
The new XPath operators intersect, except, is, >>, <<, some, every, instance of, castable as, ..., etc.
The general XPath operators >, <, etc. now work on any ordered value type (not only on numbers as in XPath 1.0).
New, safer value comparison operators: lt, le, eq, gt, ge, ne.
The XPath 2.0 to operator, allowing to have xsl:for-each select="1 to $N"
These, and many other improvements/new features significantly increase the productivity of any XSLT programmer, which allows XSLT 2.0 development to be finished in a small fraction of the time necessary for developing the same modules with XSLT 1.0.
Strong typing allows many errors to be caught at compile time and to be corrected immediately. For me this strong type-safety is the biggest advantage of using XSLT 2.0.
2 - What are the possible reasons for NOT upgrading?
It is often possible, reasonable and cost-efficient to leave existing, legacy XSLT 1.0 applications untouched and to continue using them with XSLT 1.0, while at the same time developing only new applications using XSLT 2.0.
Your management + any other non-technical reasons.
Having a lot of legacy XSLT 1.0 applications written in a poor style (e.g. using DOE or extension functions that now need to be re-written and the code refactored).
Not having available an XSLT 2.0 processor.
3 - And finally, what are the steps to upgrading?
Change the version attribute of the xsl:stylesheet or xsl:transform element from "1.0" to "2.0".
Remove any xxx:node-set() functions.
Remove any DOE.
Be ready for the surprise that xsl:value-of now outputs not just the first, but all items of a sequence.
Try to use the new xsl:sequence instruction as much as possible -- use it to replace any xsl:copy-of instructions; use it instead of xsl:value-of any time when the type of the output isn't string or text node.
Test extensively.
When the testing has verified that the code works as expected, start refactoring (if deemed necessary). It is a good idea to declare types for any variables, parameters, templates and functions. Doing so may reveal new, hidden errors and fixing them increases the quality of your code.
Optionally, decide which named templates to rewrite as xsl:function.
Decide if you still need some extension functions that are used in the old version, or you can rewrite them easily using the new, powerful capabilities of XSLT.
Final remarks: Not all of the above steps are necessary and one can stop and declare the migration successful on zero bug testing results. It is much cleaner to start using all XSLT 2.0/XPath 2.0 features in new projects.
Dimitre's answer is very comprehensive and 100% accurate (as always) but there is one point I would add. When upgrading to a 2.0 processor, you have a choice of leaving the version attribute set to "1.0" and running in "backwards compatibility mode", or changing the version attribute to "2.0". People often ask which approach is recommended.
My advice is, if you have a good set of tests for your stylesheets, take the plunge: set version="2.0", run the tests, and if there are any problems, fix them. Usually the problems will be code that was never quite right in the first place and only worked by accident. But if you don't have a good set of tests and are concerned about the reliability of your workload, then leaving version="1.0" is a lower-risk approach: the processor will then emulate all the quirks of XSLT 1.0, such as xsl:value-of ignoring all but the first item, and the strange rules for comparing numbers with strings.
I've heard that most of the time it's usually possible (and better) to use apply-templates rather than for-each when writing an XSLT. Is this true? If so, what are the benefits of using apply-templates?
Using <xsl:for-each> is in no way harmful if one knows exactly how an <xsl:for-each> is processed.
The trouble is that a lot of newcomers to XSLT that have experience in imperative programming take <xsl:for-each> as a substitute of a "loop" in their favorite PL and think that it allows them to perform the impossible -- like incrementing a counter or any other modification of an already defined <xsl:variable>.
One indispensable use of <xsl:for-each> in XSLT 1.0 is to change the current document -- this is often needed in order to be able to use the key() function on a document, different from the current source XML document, for example to efficiently access lookup-table that resides in its own xml document.
On the other side, using <xsl:template> and <xsl:apply-templates> is much more powerful and elegant.
Here are some of the most important differences between the two approaches:
xsl:apply-templates is much richer and deeper than xsl:for-each, even
simply because we don't know what code will be applied on the nodes of
the selection -- in the general case this code will be different for
different nodes of the node-list.
The code that will be applied
can be written way after the xsl:apply templates was written and by
people that do not know the original author.
The FXSL library's implementation of higher-order functions (HOF) in XSLT wouldn't be possible if XSLT didn't have the <xsl:apply-templates> instruction.
Summary: Templates and the <xsl:apply-templates> instruction is how XSLT implements and deals with polymorphism.
Reference: See this whole thread: http://www.stylusstudio.com/xsllist/200411/post60540.html
I hear time and time again about how you should avoid the use of XSLT for-each. That it's your inner imperative programming demon that should be banished.
What's so bad about it?
Does this best practice matter depending on the size of XML (i.e 100 vs 10,000 nodes)?
Essential difference between <xsl:apply-templates> and <xsl:-for-each> that nobody has pointed out:
<xsl:apply-templates> is really something much more than a nicer, more elegant equivalent of <xsl:for-each>:
xsl:apply-templates is much richer and deeper than xsl:for-each, even
simply because we don't know what code will be applied on the nodes of
the selection -- in the general case this code will be different for
different nodes of the node-list.
Also, the code that will be applied
can be written way after the xsl:apply templates was written and by
people that do not know the original author.
_2. On the other side, using <xsl:for-each> is in no way harmful if one knows exactly how an <xsl:for-each> is processed.
The trouble is that a lot of newcomers to XSLT that have experience in imperative programming take <xsl:for-each> as a substitute of a "loop" in their favorite PL and think that it allows them to perform the impossible -- like incrementing a counter or any other modification of an already defined <xsl:variable>.
One indispensable use of <xsl:for-each> is to change the current document -- this is often needed in order to be able to use the key() function on a document, different from the current source XML document, for example to efficiently access lookup-table that resides in its own xml document.
Templates tend to split up the code more nicely. Also, for-each loops suffer from the fact that people often come to them with the idea that they operate identically to the way for loops work in the major programming languages.
It is the use of a for-each to call templates that is discouraged, not the widespread use of for-each, per se. Even the Muenchian method of grouping relies on xsl:key constructs with xsl:for-each loops.
The idea of writing good XSLT is that the structure of your XML should dictate what templates are matched and then applied. Therefore, whenever possible, use apply-templates to select the nodes, rather than applying a select using for-each.
Quick answer: XSLT is largely functional in nature, and imperative loops are not very functional.
In general the based way to get the best out of XSLT is to use pattern matching as much as possible (xsl:apply-template rather than loops, ifs and call-template1).
In the end, it is all about style, and will make little difference in short pieces of XSLT, but being more functional will help with longer/complex problems.
1 Other than when creating functions which return values rather than modifying the output.
By using apply-templates, perhaps with a mode, makes it easier to include appropriate transformations of more kinds of elements that may be of interest at that location.
For example if you had XML containing a libraries contents, and you keep using for-each all over the place to select books, then you start keeping record of DVDs, CDs, Ginger Nut biscuits, etc, you will have to examine all the for-each sections to include more than just books. If you had used apply-templates, you may be able to simply create the new matches for the new elements.