Write source XML in an XSLT for development purposes - xslt

I have to write an XSLT without knowing the input XML. So I want to start by writing an XSLT that will simply return the input XML without any transformation. Can I do that?

Look at this:
http://mrhaki.blogspot.com/2008/07/copy-xml-as-is-with-xslt.html

<xsl:template match="/">
<xsl:copy-of select="."/>
</xsl:template>

What you want to do is known as the Identity Transform. To be general, you need to ensure that all attribute and non-attribute nodes are copied, recursively:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Note that the identity transform does not guarantee that the output is identical on the surface level (i.e. some hash calculation might yield a different result, for instance). E.g. attributes could be reordered - this has no impact on the infoset or validity.

Related

xml:space='preserve' doesn't seem to get on with xsl:apply-templates select="node()"

Doing some work with xsl - first time I've done anything serious, and I've hit something which I can't explain. Easiest way to show it is with the identity transform:
This works:
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
This doesn't (says "Unable to apply transformation on current source"):
<xsl:template match="#*|node()" xml:space='preserve'>
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
This does:
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*"/>
<xsl:apply-templates select="node()" xml:space='preserve'/>
</xsl:copy>
</xsl:template>
OK, I can see what's happening. But I don't understand why. Why does xml:space not want to play nicely with attributes? Just curious.
BTW, this is using the xsl translator that's built into Notepad++. Perhaps I shouldn't trust it?
What are you trying to accomplish? xml:space="preserve" tells XML-consuming applications that you want to preserve whitespace-only text nodes that are descendants of the element that xml:space is an attribute of. In this example, you have xml:space as an attribute of <xsl:apply-templates>, but <xsl:apply-templates> has no whitespace-only text node descendants, so xml:space has no possible effect.
I think you wanted to preserve whitespace-only text nodes from the input XML document (not from the XSLT stylesheet). In that case, you need xml:space to be in the input XML document, not in the XSLT stylesheet. The stylesheet can have xsl:preserve-space-elements="*", but that's already the default, unless you have xsl:strip-space-elements set.
Yes, I would be inclined to wonder whether the XSLT processor used by Notepad++ (libxml) is doing something illegit. As a good diagnostic, try a respected processor like Saxon and see if you get any errors.
Either that, or just remove xml:space from your stylesheet, since it won't do you any good even if the processor doesn't throw an error.
Suggestion:
Just use
<xsl:output method="html" indent="yes"/>
as the first child of <xsl:stylesheet>.
The indent="yes" will prevent all the output elements from being crammed together on one line, so you can read the results.
Whitespace is not preserved for attributes according to specification - it is highlighted in this posting. Preserving attribute whitespace in XSLT

How does an XSL document look like if the mirrors the input data?

The typicle XSL usage is:
XML1.xml -> *transformed using xsl* -> XML2.xml
How does an XSL document look like, if I want to simply mirror the input data?
ex:
XML1.xml -> *transformed using xsl* -> XML1.xml
How does an XSL document look like, if
I want to simply mirror the input
data?
There are more than one answers to this question, however all of them could be named "Identity Transform":
<xsl:copy-of select="/"/> This is the shortest, simplest, most efficient and most inflexible, non-extensible and unuseful identity transform.
The classical identity rule, which everybody knows (or should know):
_
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
This is still very short, one-template transformation, which is so much more extensible and useful identity transform, known also as the "identity rule". Using and overriding the identity transform is the most fundamental and powerful XSLT design pattern, allowing to solve common copy and replace/rename/delete/add problems in just a few lines. Maybe 90%+ of all answers in the xslt tag use this form of the identity transform.
.3. The fine-grained control identity rule, which everybody should know (and very few people know):
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="#*|node()[1]"/>
</xsl:copy>
<xsl:apply-templates select="following-sibling::node()[1]"/>
</xsl:template>
</xsl:stylesheet>
This is similar to the generally known identity rule defined at 2. above, but it provides a finer control over the XSLT processing.
Typically with 2. the <xsl:apply-templates select="#*|node()"> triggers a number of transformations (for all attributes and child nodes), that can be done in any order or even in parallel. There are tasks where we don't want certain types of nodes to be processed after some other nodes, so we have to plumb the leakage of the identity rule with overriding it with empty templates matching the unwanted nodes and adding other templates in a specific mode to process these nodes "when the time comes"...
.3. is more appropriate for tasks where we want more control and really sequential-type processing.
Some tasks that are very difficult to solve with 2. are easy using 3.
It would look like the identity transform:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
This is one of the most fundamental XSLT transforms. It matches any attribute or other node, copies what it matches, and then applies itself to all attributes and child nodes of the matched node.
This turns out to be quite powerful for other tasks, too. A common requirement is to copy most of a source file unchanged, while handling certain elements in a special way. This can be solved using the identity transform plus one template for the special nodes. It's a generic, flexible (and short) solution.
This matches every element or attribute and recursively applies the template.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="* | #*">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

Using XSL to sort attributes

I'm trying to canonicalize the representation of some XML data by sorting each element's attributes by name (not value). The idea is to keep textual differences minimal when attributes are added or removed and to prevent different editors from introducing equivalent variants. These XML files are under source control and developers are wanting to diff the changes without resorting to specialized XML tools.
I was surprised to not find an XSL example of how to this. Basically I want just the identity transform with sorted attributes. I came up with the following with seems to work in all my test cases:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>
<xsl:template match="*|/|text()|comment()|processing-instruction()">
<xsl:copy>
<xsl:for-each select="#*">
<xsl:sort select="name(.)"/>
<xsl:copy/>
</xsl:for-each>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
As a total XSL n00b I would appreciate any comments on style or efficiency. I thought it might be helpful to post it here since it seems to be at least not a common example.
With xslt being a functional language doing a for-each might often be the easiest path for us humans but not the most efficient for XSLT processors since they cannot fully optimize the call.
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates select="#*">
<xsl:sort select="name()"/>
</xsl:apply-templates>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="#*|comment()|processing-instruction()">
<xsl:copy />
</xsl:template>
</xsl:stylesheet>
This is totally trivial in this regards though and as a "XSL n00b" i think you solved the problem very well indeed.
Well done for solving the problem. As I assume you know the order or attributes is unimportant for XML parsers so the primary benefit of this exercise is for humans - a machine will re-order them on input or output in unpredictable ways.
Canonicalization in XML is not trivial and you would be well advised to use the canonicalizer provided with any reasonable XML toolkit rather than writing your own.

XSLT to Select Desired Elements When Nested In Not-Desired Elements

What XSLT would I use to extract some nodes to output, ignoring others, when the nodes to be be extracted are some times nested nodes to be ignored?
Consider:
<alpha_top>This prints.
<beta>This doesn't.
<alpha_bottom>This too prints.</alpha_bottom>
</beta>
</alpha_top>
I want a transform that produces:
<alpha_top>This prints.
<alpha_bottom>This too prints.</alpha_bottom>
</alpha_top>
This answer shows how to select nodes based on the presence of a string in the element tag name.
Ok, here is a better way
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="beta">
<xsl:apply-templates select="*"></xsl:apply-templates>
</xsl:template>
<xsl:template match="/|*|text()">
<xsl:copy>
<xsl:apply-templates select="*|text()"></xsl:apply-templates>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
This basically does an identity transform, but for the element you don't want to include I removed the xsl:copy and only applied templates on the child elements.
The following stylesheet works on your particular case, but I suspect you are looking for something a bit more generic. I'm also sure there is a simpler way.
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:apply-templates select="alpha_top"></xsl:apply-templates>
</xsl:template>
<xsl:template match="alpha_top">
<xsl:copy>
<xsl:apply-templates select="beta/alpha_bottom|text()"></xsl:apply-templates>
</xsl:copy>
</xsl:template>
<xsl:template match="*|text()">
<xsl:copy>
<xsl:apply-templates select="*|text()"></xsl:apply-templates>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
I think, that once you have a reasonable understand of how XSLT traversal works (hopefully I answered that in your other question) this becomes quite simple.
You have several choices on how to do this. Darrell Miller's answer shows you have to process a whole document and strip out the elements you're not interested in. That's one approach.
Before I go further, I get the impression that you might not entirely 'get' the concept of context in XSLT. This is important and will make your life simpler. At any time in XSLT there is one and only context node. This is the node (element, attribute, comment, etc) currently being 'processed'. Inside a template called via xsl:select the node that has been selected is the context node. So, given your xml:
<alpha_top>This prints.
<beta>This doesn't.
<alpha_bottom>This too prints.</alpha_bottom>
</beta>
</alpha_top>
and the following:
<xsl:apply-templates select='beta'/>
and
<xsl:template match='beta'>...</xsl:template>
the beta node will be the context node inside the template. There's a bit more to it than that but not much.
So, when you start your stylesheet with something like:
<xsl:template match='/'>
<xsl:apply-templates select='alpha_top'/>
</xsl:apply-templates>
you are selecting the children of the document node (the only child element is the alpha_top element). Your xpath statement inside there is relative to the context node.
Now, in that top level template you might decide that you only want to process your alpha_bottom nodes. Then you could put in a statement like:
<xsl:template match='/>
<xsl:apply-templates select='//alpha_top'/>
</xsl:template>
This would walk down the tree and select all alpha_top elements and nothing else.
Alternatively you could process all your elements and simply ignore the content of the beta node:
<xsl:template match='beta'>
<xsl:apply-templates/>
</xsl:template>
(as I mentioned in my other reply to you xsl:apply-templates with no select attribute is the same as using select=''*).
This will ignore the content of the beta node but process all of it's children (assuming you have templates).
So, ignoring elements in your output is basically a matter of using the correct xpath statements in your select attributes. Of course, you might want a good xpath tutorial :)
The probably simplest solution to your problem is this:
<xsl:template match="alpha_top|alpha_bottom">
<xsl:copy>
<xsl:value-of select="text()" />
<xsl:apply-templates />
</xsl:copy>
</xs:template>
<xsl:template match="text()" />
This does not exhibit the same white-space behavior you have in your example, but this is probably irrelevant.

XSLT template overriding

I have a small question regarding XSLT template overriding.
For this segment of my XML:
<record>
<medication>
<medicine>
<name>penicillin G</name>
<strength>500 mg</strength>
</medicine>
</medication>
</record>
In my XSLT sheet, I have two templates in the following order:
<xsl:template match="medication">
<xsl:copy-of select="." />
</xsl:template>
<xsl:template match="medicine/name">
<text>!unauthorized information!</text>
</xsl:template>
What I want to do is to copy everything under the medication element to the output other than the "name" element (or any other element that I explicitly define). The final xml will be shown to the user in RAW XML form. In other words, the result I want is:
<record>
<medication>
<medicine>
<text>! unauthorized information!</text>
<strength>500 mg</strength>
</medicine>
</medication>
</record>
Whereas I am getting the same XML as input, i.e. without the element replaced by text. Any ideas why the second template match is not overriding the name element in the first one? Thanks in advance
--
Ali
Template order does not matter. The only case it possibly becomes considered (and this is processor-dependent) is when you have an un-resolvable conflict, i.e. an error condition. In that case, it's legal for the XSLT processor to recover from the error by picking the one that comes last. However, you should never write code that depends on this behavior.
In your case, template priority isn't even the issue. You have two different template rules, one matching <medication> elements and one matching <name> elements. These will never collide, so it's not a question of template priority or overriding. The issue is that your code never actually applies templates to the <name> element. When you say <xsl:copy-of select="."/> on <medication>, you're saying: "perform a deep copy of <medication>". The only way any of the template rules will fire for descendant nodes is if you explicitly apply templates (using <xsl:apply-templates/>.
The solution I have for you is basically the same as alamar's, except that it uses a separate processing "mode", which isolates the rules from all other rules in your stylesheet. The generic match="#* | node()" template causes template rules to be recursively applied to children (and attributes), which gives you the opportunity to override the behavior for certain nodes.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- ...placeholder for the rest of your code... -->
<xsl:template match="/record">
<record>
<xsl:apply-templates/>
</record>
</xsl:template>
<!-- end of placeholder -->
<xsl:template match="medication">
<!-- Instead of copy-of, whose behavior is to always perform
a deep copy and cannot be customized, define your own
processing mode. Rules with this mode name are isolated
from the rest of your code. -->
<xsl:apply-templates mode="copy-medication" select="."/>
</xsl:template>
<!-- By default, copy all nodes and their descendants -->
<xsl:template mode="copy-medication" match="#* | node()">
<xsl:copy>
<xsl:apply-templates mode="copy-medication" select="#* | node()"/>
</xsl:copy>
</xsl:template>
<!-- But replace <name> -->
<xsl:template mode="copy-medication" match="medicine/name">
<text>!unauthorized information!</text>
</xsl:template>
</xsl:stylesheet>
The rule for "medicine/name" overrides the rule for "#* | node()", because the format of the pattern (which contains a "/") makes its default priority (0.5) higher than the default priority of "node()" (-1.0).
A complete but concise description of how template priority works can be found in "How XSLT Works" on my website.
Finally, I noticed you mentioned you want to display "RAW XML" to the user. Does that mean you want to display, for example, the XML, with all the start and end tags, in a browser? In that case, you'd need to escape all markup (e.g., "<" for "<"). Check out the XML-to-string utility on my website. Let me know if you need an example of how to use it.
Add
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
to your <xsl:template match="medicine/name">
And remove <xsl:template match="medication"> altogether!
<?xml version="1.0" encoding="windows-1251"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="medicine/name">
<text>!unauthorized information!</text>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>