Marklogic xslt performance - xslt

I have a XSLT that I'm executing via the xdmp:invoke() function and I'm running into very long processing times to see any result (in some instances timing out completely after max time out of 3600s is reached). This XSLT runs approximately in 5sec in Oxygen editor. Some areas I think maybe impacting performance:
The XSLT produces multiple output files, using xsl:result-document. The MarkLogic XSLT processor outputs these as result XML nodes, as it cannot physically save these documents to a file system.
The XSLT builds variables that contain xml nodes, which then are processed by other template calls. At times these variables can hold a large set of XML nodes.
I've done some profiling on the XSLT and it seem that building the variables seems to be the most time consuming part of the execution. I'm wondering why that's the case and why does it run a lot faster on the saxon processor?
Any insight is much appreciated.

My understanding is that there are some XSLT performance optimizations that are difficult or impossible to implement in the context of a database in comparison to a filesystem. Also, Saxon is the industry leader in XSLT and is significantly faster than almost anything on the market, although that probably doesn't account for the large discrepancy you describe.
You don't say which version of MarkLogic you're running, but version 8.0 has made significant improvements in XSLT performance. A few simple tests I ran suggested 3-4x speed improvement, depending on the XSLT.
I have run into some rare but serious performance edge cases for XSLT when running MarkLogic on Windows. Linux and OSX builds don't appear to have this problem. It is also far more highly pronounced when the XSLT tasks are running on multiple threads.
It is possible, however, to save data directly to the filesystem instead of the database using xdmp:save.
Unless your XSLTs involve very complex templating rules, I would recommend at least testing some of performance-sensitive XSLT logic in XQuery. It may be possible to port the slowest parts and pass the results of those queries to the XSLT. It's not ideal, but you might be able to achieve acceptable performance without rewriting the XSLTs.
Another idea, if the problem is simply the construction of variables in a multi-pass XSLT, is to break the XSLT into multiple XSLTs and make multiple calls to xdmp:xslt-invoke from XQuery. However, I know there is some overhead to making an xdmp:xslt-invoke call, so it may be a wash, or it may be worse.

I have come across similar performance issues with stylesheets in ML 7. To come to think of it I had similar stylesheets as the ones you have mentioned i.e. variables holding sequence of nodes. It seems xslt cannot be possibly optimised as well as xquery is. If you are not satisfied with the performance of your stylesheets I would recommend you to convert the xslt to it's equivalent xquery. I did this and achieved about 1~1.5 secs performance gains. It may be worth the effort :)

Well in my case, it seems that using the fn:not() function in template match rules is causing the slow performance. Perhaps if someone else is experiencing the same problem this might be a good starting point.

Related

fastest xslt 1.0, and xslt 2.0 processors in terms of performance

which xslt processor you would suggest, for xslt 1.0 and xslt 2.0 respectively, in terms of performance? Say one has a huge xml file, and would like to make the conversion time faster. Does the language it is implemented play a very important role?
For instance, i would like to try a go language implementation, if one has to suggest such an implementation, or a c language processor?
To be clear. I have made things in both, xslt 1.0 and xslt 2.0, so i need a way to make the process faster on each specification.
The implementation language for the processor isn't a very important factor. Sure, different languages pose different challenges: if you write an XSLT processor in C (or any other language without garbage collection) then you have to put a lot of effort into memory management, but that affects the cost and effort of writing the processor more than the performance it ends up achieving.
Performance isn't one dimensional. Different processors will compare differently depending on the actual stylesheet. In addition, it depends what you measure: processor start-up time, stylesheet compile time, source document parsing/building time, single-threaded performance or multi-threaded performance. There's no substitute for making your own measurements tailored to your particular workload.
If the performance of your workload isn't meeting requirements, then seeing how it fares on a different XSLT processor is one of the actions you can take to try and solve the problem. But there are many other actions you can take as well, and some of them may be less disruptive. Before trying a different XSLT processor, check to make sure that XSLT processing is actually a critical component of your overall system performance: I've seen plenty of cases were I've been called in to look at perceived XSLT performance issues and it turned out on investigation that the issues were nothing to do with XSLT processing. And where it is XSLT processing, sometimes a simple change to your XSLT code, or to the way you run the XSLT processor, will make a vast saving.

Xpath expression VS XSL variable

Does the performance of XSLT improve when xsl variable is used instead of XPath expression?
Update: I'm using Xalan for processing.
Does the performance of XSLT improve when xsl variable is used instead
of XPath expression?
This depends on the XSLT processor being used. If the XSLT processor has a good optimizer, in many cases it does the factorization by itself and there is no real speed gain doing this by hand.
However:
"Saving" the result of evaluation in a variable can make the code shorter and more readable.
This is a good application of the DRY (Don't Repeat Yourself) best practices.
Relying on the Optimizer not always works.
We shouldn't rely on optimizers when writing portable code, that is intended to be executed by more than one different XSLT processors -- such as when writing a library of functions/templates.
With some XSLT 2.0 processors, such as Saxon, one can even have xsl:function execution optimized, by turning on function memoization. In the case of Saxon this is done by setting the extension attribute saxon:memo-function to "yes".
It is my experience that it does but more important it improves the readability of the code. It also make code reuse simpler.

Which XSLT processor actually took advantage of parallel-processing?

XSLT processing has the potential to be very fast because many of it's language elements allows things to be processed in parallel.
However even though theoretically things can run in parallel and processing can be insanely fast, in practice is there an actual implementation of an XSLT processor that took advantage of this potential and actually run things in parallel?
You'll probably have to look at the high-end commercial XSLT processors (Datapower, Intel) for this kind of capability. There's very little technical information available about either, but there have been one or two conference papers describing techniques that may or may not have found their way into product.
(Personally, I have a bit of a feeling that both these products sell on the basis that if the product is expensive, it must be good. But that feeling is based solely on the absence of information, rather than on any real knowledge.)
Saxon's documentation http://www.saxonica.com/documentation/extensions/attributes/threads.xml documents an extension attribute for xsl:for-each to specify the number of threads to be used to execute the for-each. It is only available in the commercial version of Saxon, I haven't used that so I can't tell you more about it.

One large xslt over smaller more granular ones

We have one large xslt that renders a whole shop area including products, manifacturers and does filtering based on price and cateogry on top of that.
I'm using sitecore as a CMS and I'm having problems with caching. I have about 9000 items and some pages take as much as 20s to render.
Will it be better to split the xslt into smaller parts? Does it improve speed?
I think the xslt engine sitecore uses is called Nexus.
Update:
I think we need to optimise the xslt. Even though there were about 9k items the sitecore profiler showed we're actually traversing about 250k items while doing various checks.
you probably get a better performance by applying other changes than splitting the XSLT file. Without seeing the XSLT it is hard to spot bottlenecks but you will find some best practices for XSLT performance here:
http://www.dpawson.co.uk/xsl/sect4/N9883.html#d15756e150
In addition it might be very helpful to use an XSLT profiler in that case.
Some performance tricks also depend on the engine that you are using, so some additional information might be useful here as well.
If you could post your XSLT code I might help you in finding possible bottlenecks.
Sounds like the problem is with sitecore not XSLT (I've done faster transforms against 10's of K rows), but I'd advise splitting generally to enable code reuse.
Separating one huge rendering into smaller ones will help if you use Sitecore caching. Having multiple renderings will allow to apply individual cache settings to each.
There are two different issues here:
Separating XSLT files for better readability, maintainability and code reuse
Making performance improvements on your XSLT translations
The first should be done as a best practice, the latter should take care of the extended rendering times you are getting
Definitely use as small of XSLTs that make sense. Thats just good practice and can't hurt performance.

Memory-efficient XSLT Processor

I need a tool to execute XSLTs against very large XML files. To be clear, I don't need anything to design, edit, or debug the XSLTs, just execute them. The transforms that I am using are already well optimized, but the large files are causing the tool I have tried (Saxon v9.1) to run out of memory.
I found a good solution: Apache's Xalan C++. It provides a pluggable memory manager, allowing me to tune allocation based on the input and transform.
In multiple cases it is consuming ~60% less memory (I'm looking at private bytes) than the others I have tried.
You may want to look into STX for streaming-based XSLT-like transformations. Alternatively, I believe StAX can integrate with XSLT nicely through the Transformer interface.
It sounds like you're sorted - but often, another potential approach is to split the data first. Obviously this only works with some transformations (i.e. where different chunks of data can be treated in isolation from the whole) - but then you can use a simple parser (rather than a DOM) to do the splitting into manageable pieces, then process each chunk separately and reassemble.
Since I'm a .NET bod, things like XmlReader can do the chunking without a DOM; I'm sure there are equivalents for every language.
Again - just for completeness.
[edit re question]
I'm not aware of any specific name; maybe Divide and Conquer.
For an example; if your data is actually a flat list of like objects, then you could simply split the first-level children - i.e. rather than having 2M rows, you split it into 10 lots of 200K rows, or 100 lots of 20K rows. I've done this before lots of times for working with bulk data (for example, uploading in chunks of data [all valid] and re-assembling at the server so that each individual upload is small enough to be robust).
For what it's worth, I suspect that for Java, Saxon is as good as it gets, if you need to use XSLT. It is quite efficient (both cpu and memory) for larger documents, but XSLT itself essentially forces full in-memory tree of contents to be created and retained, except for limited cases. Saxon-SA (for-fee version) supposedly has extensions to allow taking advantage of such "streaming" cases, so that might be worth checking out.
But the advice to split up the contents is the best one: if you are dealing with independent records, just split the input using other techniques (like, use Stax! :-) )
I have found that a custom tool built to run the XSLT using earlier versions of MSXML makes it very fast, but also consumes incredible amounts of memory, and will not actually complete if it is too large. You also lose out on some advanced XSLT functionality as the earlier versions of MSXML don't support the full xpath stuff.
It is worth a try if your other options take too long.
That's an interesting question. XSLT could potentially be optimized for space, but I expect all but the most obscure implementations around start by parsing the source document into DOM, which is bound to use a low multiple of the document size in memory.
Unless the stylesheet is specially designed to support a single-pass transformation, reasonable time performance would probably require parsing the source document into a disk-based hierarchical database.
I do not have an answer, though.
It appears that Saxon 9.2 may provide an answer to your problem. If your document can be transformed without using predicates (does not reference any siblings of the current node) you may be able to use Streaming XSLT.
See this link
I have not tried this myself, I am just reading about it. But I hope it works.
Are you using the Java version of Saxon, or the .Net port? You can assign more memory to the Java VM running Saxon, if you are running out of memory (using the -Xms command line parameter).
I've also found that the .Net version of Saxon runs out of memory less easily than the Java version.
For .NET you can use solution suggestion on Microsoft Knowledge Base:
http://support.microsoft.com/kb/307494
XPathDocument srcDoc = new XPathDocument(srcFile);
XslCompiledTransform myXslTransform = new XslCompiledTransform();
myXslTransform.Load(xslFile);
using (XmlWriter destDoc = XmlWriter.Create(destFile))
{
myXslTransform.Transform(srcDoc, destDoc);
}
Take a look at Xselerator