xlst: from base xml to intermediate xml to final output - xslt

This is a follow up on my earlier question xslt split mp3 tag into artist and title
I'll try to phrase it in a generic terms because I think it will me allow for a better understanding of XSLT: what and how use it using the appropriate XSLT idoms.
This is what I want:
input XML -> intermediate XML -> ... -> final transformation
Or in other words: how can I pipeline various XML transformations in one XSLT document?
My command-line analogy would be to have multiple command line tools that perform parts of the solution, then have them execute in succeeding order using pipes.
In this specific case:
input XML (with element) -> intermediate XML (with separate and element) -> final XML sorted by ,
I'm limited to one XSLT document as the web-tool at hand does not even allow xsl:include or xsl:import to succeed.

Three approaches that come readily to mind are:
Use operating-system pipelines:
xsltproc ss1.xsl input.xml \\
| xsltproc ss2.xsl - \\
| xsltproc ss3.xsl - \\
> output.xml
The primary downside I'm aware of here is that not all processors have command-line interfaces that make it easy to read the main input tree on stdin. So when I do this, I sometimes end up writing temporary files; fortunately, disk space is cheap. Upside: you probably already know how to do this.
Use XProc pipelines.
Primary downside: you have to learn a new technology. Primary upside: you get to learn a new technology, which is actually quite cool.
Define different modes for the different operations and use XSLT 2.0 (or an XSLT 1.0 processor with some form of the node-set extension) to process the data:
<xsl:template match="/">
<xsl:variable name="tree1">
<xsl:apply-templates mode="mode1"/>
</xsl:variable>
<xsl:variable name="tree2">
<xsl:apply-templates mode="mode2" select="$tree1"/>
</xsl:variable>
<xsl:apply-templates mode="mode3" select="$tree2"/>
</xsl:template>
Upside: it's all in a single stylesheet, so you never have to puzzle out how to run the process, when you come back to it six months later. (And the phrasing of your question says that this is the answer you really want.) Downside: it's all in a single stylesheet, so you have to work harder to achieve modularity and separation of concerns.
There are doubtless other approaches as well.

Related

Multiple XSLT files in a single pipeline with ant

I have multiple XSLT files that I'm using to process my source XML in a pipeline. I know about the trick with exsl:node-set but after having some issues with this workflow, I took the decision to split the various passes into separate XSL files. I'm much happier with the structure of the files now and the workflow works fine in Eclipse. Our release system works with ant. I can process the files like this:
<xslt basedir="src-xml" style="src-xml/preprocess_1.xsl" in="src-xml/original.xml" out="src-xml/temp_1.xml" />
<xslt basedir="src-xml" style="src-xml/preprocess_2.xsl" in="src-xml/temp_1.xml" out="src-xml/temp_2.xml" />
<xslt basedir="src-xml" style="src-xml/preprocess_3.xsl" in="src-xml/temp_2.xml" out="src-xml/temp_3.xml" />
<xslt basedir="src-xml" style="src-xml/finaloutput.xsl" in="src-xml/temp_3.xml" out="${finaloutput}" />
But this method, going via multiple files on disk, seems inefficient. Is there a better way of doing this with ant?
Update following Dimitre's suggestion
I've created myself a wrapper around the various other XSLs, as follows:
<xsl:stylesheet version='1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform' xmlns:fn='http://www.w3.org/2005/xpath-functions' xmlns:exslt="http://exslt.org/common">
<xsl:import href="preprocess_1.xsl"/>
<xsl:import href="preprocess_2.xsl"/>
<xsl:import href="preprocess_3.xsl"/>
<xsl:import href="finaloutput.xsl"/>
<xsl:output method="text" />
<xsl:template match="/">
<xsl:apply-imports />
</xsl:template>
</xsl:stylesheet>
This has... not worked well. It looks like the document had not been preprocessed before the final output XSL ran. I should perhaps have been clearer here: the preprocess XSL files are modifying the document, adding attributes and the like. preprocess_3 is based on the output of ..._2 is based on ..._1. Is this import solution still appropriate? If so, what am I missing?
The more efficient method is to perform a single, multipass transformation.
The files can remain as they are -- they will be imported using xsl:import instructions.
The savings are obvious:
Just one initiation (loading of the XSLT processor).
Just one termination.
Eliminates the two intermediate files and their creation, writing into, closing and deleting.
Hmm, you say I know about the trick with exsl:node-set, but you don't use it in your attempt ("Update following Dimitre's suggestion"). In case you don't know it, or for the others (like me) who don't know how to perform multipass transformation, here is a nice article: Multipass processing.
The drawback of this approach is that it requires engine specific xsl code. So if you know the engine, you could try this. If you don't know the engine, you could try with solutions from result tree fragment to node-set: generic approach for all xsl engines.
Looking at these sources one conclusion is sure: your current solution is more readable. But you are seeking efficiency, so some readability may be sacrificed.

How to use XML wrapped in CDATA inside another XML for XSL transformation?

An XML document contains another XML element, which is wrapped in CDATA.
How can the wrapped XML be used for XSL and XSL-FO transformation (version 1)?
If you are willing to take multiple transformation steps then it is possible. Output the relevant section with disable-output-escaping to turn the escaped XML into valid XML. Process it in a subsequent step.
It does require the escaped XML to be well-formed. And some parsers require the intermediate result to be serialized (to disk or else) first to make sure the escaped XML is properly unescaped before it enters the subsequent transformations.
This is not possible with standard XSLT 1.0 or 2.0, in a single transformation.
It can be done using Saxon 9 Professional Edition or Enterprise Edition. These products have a saxon:parse() extension function. Or use the XPath 3.0 parse-xml() function, which is also supported by recent versions of Saxon PE/EE.
As #grtjn points out, it is possible to do it with a two-pass process. Stylesheet 1 turns the CDATA-wrapped text into parseable XML (using <xsl:value-of select="whatever" disable-output-escaping="yes"/>). Stylesheet 2 then processes the serialized result produced by stylesheet 1.

XSLT getting character count of transformed XML

I am creating some XML from an XSLT
the XML after transformation looks a little like...
<root><one><two>dfd</two></one></root>
I need to get a character count for the output (in this case would be 38).
I tried putting the whole lot in a variable then doing a string-length($vVariable) but this only brings back 3 (for the 'dfd' it excludes the characters of the tags)
This is going to be very difficult to do in straight XSLT, since it's internal data model doesn't see XML elements as strings. Although your particular example is very simple, there are multiple valid ways to serialize the same XML into text, especially when you get into namespaces.
Your best bet may be to send the result of your transformation to another tool. If you're running the XSLT processor from the command line, you could use a tool like the linux command "wc"). If you're calling XSLT from within a larger program, you could use that language's built-in string-length functionality.

XSLT Unit testing

Does anyone know of a way to write unit tests for the XSLT transformation?
I've a lot of XSLT files and it's getting harder to test them manually. We have an example XML and can compare it to the resulting output XML from the XSL transormation. However, I'm looking for a better test method.
I am currently looking for some good options to do this as well. As a result, I came across this question, and a few other potential candidate solutions. Admittedly, I haven't tried any of them yet, so I can't speak to their quality, but at least they are some other avenues potentially worthy of researching.
Jenni Tennison's Unit Testing Package
UTF-X Unit Testing Framework
Juxy
XTC
Additionally, I found the following article to be informative in terms of a general methodology for unit testing XSLT.
Unit test XSL transformations
Try XSpec, a testing framework for XSLT. It allows you to write tests declaratively, and test templates and functions.
Looks like Oxygen editor has Unit Testing available as well. It "provides XSLT Unit Test support based on XSpec".
I haven't tried it myself, but will soon.
Here are a few simple solutions:
Use xsltproc with a mock XML file:
xsltproc test.xsl mock.xml
XSLT Cookbook - Chapter 13
Create a document() placeholder variable and comment/uncomment it manually:
<xsl:variable name="Data" select="descendant-or-self::node()"/>
<!--
<xsl:variable name="Data" select="document('foo.xml')" />
-->
<xsl:if test="$Data/pagename='foo'">
<p>hi</p>
</xsl:if>
Create a condition to swap the comment programmatically:
<xsl:variable name="Data">
<xsl:choose>
<!-- If source XML is inline -->
<xsl:when test="descendant-or-self::node()/pageName='foo'"/>
<xsl:value-of select="descendant-or-self::node()"/>
</xsl:when>
<!-- If source XML is external -->
<xsl:otherwise>
<xsl:value-of select="document('foo.xml')" />
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
Use a shell script to inline the data programmatically in the build to automate the tests completely.
References
Transformiix Test Cases
Running XSLT at the Department: Command Line XSLT Processing
Building TransforMiiX standalone - Archive of obsolete content | MDN
OASIS XSLT Conformance TC Public Documents
Using XSLT to Assist Regression Testing
MicroHowTo: Process an XML document using an XSLT stylesheet
Tip: Debug stylesheets with xsl:message
Batch XSLT Processing
Embedded Stylesheet Modules: XSL Transformations (XSLT) Version 3.0
Multi layer conditional wrap HTML with XSLT
XPath 1.0: Axes
CentOS 7.0 - man page for xsltproc
XMLStarlet command line XML toolkit download | SourceForge.net
We have been using Java based unit test cases, in which we provide expected xml string after transformation and input xml string which needs to be transformed using some XSL.
Refer to following package if you want to explore more.
org.custommonkey.xmlunit.Transform
org.custommonkey.xmlunit.Diff
org.custommonkey.xmlunit.DetailedDiff
I´m using this tool: jxsltunit.
The test is defined by an XML file which is then passed to the tool. This is an example of the test configuration:
<xsltTestsuite xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="jxsltunit jxslttestsuite.xsd" xmlns="jxsltunit"
description="Testsuite Test"
xml="min-test.xml"
xslt="min-test.xslt"
path="pa > ch">
<xsltTestcase match_number="0">
<![CDATA[<ch>child 1</ch>]]>
</xsltTestcase>
<xsltTestcase match_number="1">
<![CDATA[<ch>child 2</ch>]]>
</xsltTestcase>
</xsltTestsuite>
It takes the XML, the XSL and a path in the transformed XML which gets tested. The path can contain a list which elements are identified by their index.
One benefit of this tool is that it can output the results as a junit XML file. This file can be picked up by your Jenkins to show the XLST-tests in your test results. Just add the call to the tool as a build step.
Try Jenni Tennison's Unit Testing Package (XSpec), which is a unit test and behaviour-driven development (BDD) framework for XSLT, XQuery, and Schematron. It is based on the Spec framework of RSpec, which is a BDD framework for Ruby.
With XSpec you can test XLT template wise or XPath wise per your need.
For an overview on how to use/handle/write (installation|execution) click https://github.com/xspec/xspec/wiki/What-is-XSpec

Which is more prefered to be called, "XSL file" or "XSLT file"?

What should I call my file containing XSL code (XSLT code?),
which one sounds more sensible/meaningful.
I know the abbreviation of both of them. But don't know which one to use and where ..
I just now came across these words in w3schools.com
It started with XSL and ended up with
XSLT, XPath, and XSL-FO.
and also saw written "XSL Source code", So what about .. Alejandro's comments .. ?
Now my question reduces to yet more simple version..
Can we call XSLT code as XSL code too? or is it deprecated?
To be nitpicker, I believe XSL is the language and XSLT is a transformation (a piece of code that is written in XSL to transform one XML to another).
So when you're talking of particular pieces of code I think XSLT is more appropriate (just XSLT not XSLT codes), like apha XSLT or so. But when you are talking about language, it should be XSL e.g. XSL skills, XSL code.
This is not only a matter of opinion: just look and see how many questions are in the xslt tag (2299) and how many are in xsl (753) -- the result is clear, isn't it? :)
Related to this in 2010 I proposed that the xsl tag be considered a synonim for the xslt tag. This proposal was voted and approved.
From http://www.w3.org/TR/xslt#section-Introduction
A transformation expressed in XSLT is
called a stylesheet
From http://www.w3.org/TR/xslt20/#what-is-xslt
[Definition: A transformation in the
XSLT language is expressed in the form
of a stylesheet, whose syntax is
well-formed XML [XML 1.0] conforming
to the Namespaces in XML
Recommendation [Namespaces in XML
1.0].]
Leaving aside the formality of the specification, you can call it what you will. With respect to the components of the stylesheet (or transformation): for XSLT 1.0 they are instructions and its contents are templates, for XSLT 2.0 they are declarations and instructions and its contents are sequence constructors
There is no appreciable difference. I use them interchangeably, and so do most people.
Really, does it matter? I'm all for correct terminology, because it usually matters in IT, but in this case you really won't confuse anyone whichever one you use.