Resource Explaining XSLT Processing Path - xslt

What is the best resource to learn the principles by which XSLT applies template rules?
Questions like this seem like they should be relatively easy to answer, and certainly so after some study. I'm almost embarrassed to post them. But I have looked at Kay's XSLT Programmer's Reference, the XSLT Cookbook, and Learning XSLT, and I still cannot find a clear explanation of how a node and its children will be processed by a set of rules. Maybe I'm an idiot, but I haven't found Python, Linux, Apache, MySQL, or bash to be anything like XSLT for sheer frustration.
UPDATE Thank you for your answers. I won't be able to pick this up again for several days, but I do appreciate the help.

This section in the specification on XSLT Template Rules is fairly straight-forward, and gives examples.
Don't think of XSLT as acting on your XML.
Think of your XML as flowing through XSLT.
XSLT starts at the root template, with an imaginary cursor at the root level of your document tree:
<xsl:template match="/">
...stuff in here...
</xsl:template>
In the middle, XSLT will start executing statements. If you have an <xsl:apply-templates select="...something..."/>, then it will find everything at the cursor that matches the XPath in the select=, and look for templates with a <xsl:template match="...something...">. The first one it finds, according to the precedence rules, it will execute.
While executing that new template, the same things apply, except the context is now the node that matched.
Those are the basics. So for example, if you have this XSLT program, then no matter what the input, you'll get <hello world="!"/> as the output:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<hello world="!"/>
</xsl:template>
</xsl:stylesheet>
But if you have input like this:
<people>
<name>Paige</name>
<name>Hayley</name>
<name>Hamlet</name>
</people>
and use this as a transform:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<table>
<xsl:apply-templates select="/people/name[substring(., 1, 1) = 'H']"/>
</table>
</body>
</html>
</xsl:template>
<xsl:template match="name">
<tr>
<td>
<xsl:value-of select="."/>
</td>
</tr>
</xsl:template>
</xsl:stylesheet>
It will first set up an HTML file. It will then find all of the 'name' elements under the 'people' elements under the root element, and if they start with the letter 'H' a template will be searched for that matches 'name'. Since we have one, it will write out a table row with the name. It will execute that rule twice, and then the HTML elements will be closed, and the script will end.
Does this help?

The answers above all do a good job of explaining what happens if you have templates defined but it's very important to understand the built-in behaviour of XSLT as well.
Template processing is driven by the XSLT engine itself not (in general) by your code. In that way, it's very different to the procedural languages you've mentioned. If you have any background in functional programming that will help a great deal.
The initial behaviour of XSLT is to match the document node. The document node is an 'imaginary' node that acts as the parent of your xml document's root node. It represents the entire document. The built-in behaviour is effectively an xsl:apply-templates that looks like:
<xsl:apply-templates select='/|*'/>
The xpath statement matches the documentnode or any other element. It matches the root node and then traverses your document. Think of the document as a tree. At each element node it will execute exactly the same statement. XSLT traverses nodes in a left to right order (so if your root element has two children it will hit the first one in the document before the second). Since it's executing the select above it will then progress to the children of that node and do the same. This is a depth-first left-to-right traversal of the tree.
Now, at each element node the XSLT engine hits it will look for a matching template. The rules are relatively simple - it will choose to execute the most specific template. The built-in template is always the least specific. A template matching a full path is very specific:
<xsl:template match='/some/path/to/a/node'>...</xsl:template>.
A template matching just a node name is less specific:
<xsl:template match='node'>...</xsl:template>
If you have defined a template that the engine selects (any template you defined is going to get used in preference to a built-in), the default traversal above stops. It executes your template and stops unless your template starts a traversal again:
<xsl:template match='node'>
<p><xsl:value-of select='#text'/></p>
<xsl:apply-templates/>
</xsl:template>
That apply-templates above restarts our traversal (btw, apply-templates with no select attribute is the same as using select=''*.
I hope that helps. This is one of those situations where a diagram is the best possible approach.

If you have some money to spend on training, Ken Holman has an excellent set of XML/XSLT/XPATH/XSL-FO training courses.
http://www.cranesoftwrights.com/training/ptux/ptux-video.htm He links to some sample videos.
I have attended his training sessions in-person. He is very thorough and explains the processing model, functions, and aspects of XML/XSLT/XPATH. It is important to understand how the node trees are processed and how an XSLT engine "walks the tree". Then XSLT templates and the distinction between "push" and "pull" really make sense.
XSLT requires a different way of looking at things. Many programmers have a hard time adjusting or understanding XSLT because they keep thinking of things in terms of procedural code, rather than a more functional manner.

Related

How to use xslt to alter xml namespaces

I'm trying to use xslt to change namespaces an xml file. I was hoping to get something a bit closer than I have so far but I'm going around in circles so I thought I'd ask the question a little earlier than I ordinarily would...
My XML file is
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<ns2:apple xmlns:ns2="http://veg.com/app/api/apple" xmlns:ns1="http://veg.com/app/api" xmlns:ns3="http://veg.com/app/api/apple/red"
xmlns:ns4="http://veg.com/app/banana" xmlns:ns5="http://veg.com/app/api/pear" xmlns:ns6="http://veg.com/app/api/orange"
ns1:created="2016-05-23T16:47:55+01:00" ns1:href="http://falseserver:8080/app/api/apple/1" ns1:id="1">
<ns2:name>granny smith</ns2:name>
<ns2:flavour>sweet</ns2:flavour>
<ns2:origin>southwest region</ns2:origin>
</ns2:apple>
The only part I want to change are the urls in the namespace atributes of the root element to (veg to fruit)
<ns2:apple xmlns:ns2="http://fruit.com/app/api/apple" xmlns:ns1="http://fruit.com/app/api" xmlns:ns3="http://fruit.com/app/api/apple/red"
xmlns:ns4="http://fruit.com/app/banana" xmlns:ns5="http://fruit.com/app/api/pear" xmlns:ns6="http://fruit.com/app/api/orange"
ns1:created="2016-05-23T16:47:55+01:00" ns1:href="http://falseserver:8080/app/api/apple/1" ns1:id="1">
I've tried a few things but have failed spectacularly so far. My last attempt was
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="*">
<xsl:element name="{local-name()}" namespace="http://fruit.com/app/api/apple">
<xsl:apply-templates select="#*|node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
which has given me a ridiculous result -
<?xml version="1.0" encoding="UTF-8"?><ns0:apple xmlns:ns0="http://fruit.com/app/api/apple" ns1:created="2016-05-23T16:47:55+01:00" ns1:href="http://falseserver:8080/app/api/apple/1" ns1:id="1">
<ns1:name xmlns:ns1="http://fruit.com/app/api/apple">granny smith</ns1:name>
<ns2:flavour xmlns:ns2="http://fruit.com/app/api/apple">sweet</ns2:flavour>
<ns3:origin xmlns:ns3="http://fruit.com/app/api/apple">southwest region</ns3:origin>
</ns0:apple>
I think my first and biggest issue is trying to do the correct match. As you can see I have resorted to a * but this is out of desperation rather than because I think it's the right way to go! I'm also not sure why it drops all the other namespace attributes from the tag bu this also seems to happen quite consistently in all the ways I've tried so far. I've no idea how I've managed to get the new 'tag namespaces' in the document but I suspect if I could at least get the start of the xsl doc correct I'd be much closer to the answer...
In the code provided (your "spectacular failure") you have left the XSLT processor to think up new prefixes. When you don't specify a prefix in xsl:element, an XSLT 1.0 processor is expected to dream one up. This changes in XSLT 2.0, which is required to put the element in the default namespace. If you want to use the prefix ns2, then specify
<xsl:element name="ns2:{local-name()}" namespace="http://fruit.com/app/api/apple">
which will work in either release (except that in XSLT 1.0, processors have blanket permission to change namespace prefixes any way they want: but most processors don't do this in ordinary circumstances).
This will work for the namespaces that are actually used in element names. It will drop namespaces that are unused. Your example appears to retain some unused namespaces from the source document, but changing the prefix.
If you know that the root element of your document is always ns2:apple, then generate that element in the output using a literal result element with all the desired namespaces.
If you don't know the name of the root element in advance, then controlling its namespace declarations is quite hard to achieve in XSLT 1.0 (and rarely needed!). In XSLT 2.0 you can do it using the xsl:namespace instruction.

XSLT: <apply-templates select="...">

I have a question regarding <xsl:apply-templates>.
Lets assume I have an XML like this (see http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-subst.html):
<transcription>
<subst>
<del>wrong</del>
<add>right</add>
</subst>
</transcription>
Now I want to process this recording of a transcription in different ways using XSLT.
If I just want to present the correction to the user, I could use an XSLT template like this:
<xsl:template match="subst"><xsl:apply-templates select="./add"/></xsl:template>
<xsl:template match="subst/add"><xsl:apply-templates/></xsl:template>
However, I could also write:
<xsl:template match="subst"><xsl:apply-templates/></xsl:template>
<xsl:template match="subst/add"><xsl:apply-templates/></xsl:template>
<!-- del: ignore contents -->
<xsl:template match="subst/del"></xsl:template>
In the first case, I explicitly only address add inside <subst>, ignoring <del>.
In the second case, I ignore <del> by providing a template that does not do anything with the element, resulting in the same effect.
If I am not mistaken, the two ways are equivalent. Which one is preferable?
IMHO, not processing nodes at all is preferable to processing them with an empty template. But sometimes the alternative is more convenient, e.g. for reasons of code readability.

Replacing NameSpaces one from Another using XSLT

I have one XML input file in which I am getting some Namespaces which I wanted to replace from another using XSLT. Actually I am new XSLT so not able to find proper solution. below is the XML input payload and Output payload which I want. Could anyone help me on that.
Input Payload:
<ns:createOrderResponse xmlns:ns="http://services.oms.ecom.ecc.com"><ns:return type="com.ecc.ecom.oms.beans.xsd.CreateOrderResponse"><ns:omsGeneratedOrderId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true" /><ns:responseCode>99</ns:responseCode><ns:responseDesc>INVALID ORDER</ns:responseDesc><ns:sellerSiteId>10196</ns:sellerSiteId><ns:serverProcElapsedTime>8</ns:serverProcElapsedTime><ns:siteGeneratedOrderId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true" /><ns:subResponse><responseCode xmlns="http://beans.oms.ecom.ecc.com/xsd">1144</responseCode><responseDescription xmlns="http://beans.oms.ecom.ecc.com/xsd">Order Total mismatch</responseDescription></ns:subResponse><ns:subResponse><responseCode xmlns="http://beans.oms.ecom.ecc.com/xsd">1147</responseCode><responseDescription xmlns="http://beans.oms.ecom.ecc.com/xsd">Order Grand Total and sum of OrderItem Grand Total mismatch</responseDescription></ns:subResponse><ns:transactionNumber>0717299145</ns:transactionNumber></ns:return></ns:createOrderResponse>
Desired Output:
<ns:createOrderResponse xmlns:ns="http://services.oms.ecom.ecc.com"><ns:return type="com.ecc.ecom.oms.beans.xsd.CreateOrderResponse"><ns:omsGeneratedOrderId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true" /><ns:responseCode>99</ns:responseCode><ns:responseDesc>INVALID ORDER</ns:responseDesc><ns:sellerSiteId>10196</ns:sellerSiteId><ns:serverProcElapsedTime>8</ns:serverProcElapsedTime><ns:siteGeneratedOrderId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true" /><ns:subResponse><responseCode xmlns="http://services.oms.ecom.ecc.com">1144</responseCode><responseDescription xmlns="http://services.oms.ecom.ecc.com">Order Total mismatch</responseDescription></ns:subResponse><ns:subResponse><responseCode xmlns="http://services.oms.ecom.ecc.com">1147</responseCode><responseDescription xmlns="http://services.oms.ecom.ecc.com">Order Grand Total and sum of OrderItem Grand Total mismatch</responseDescription></ns:subResponse><ns:transactionNumber>0717299145</ns:transactionNumber></ns:return></ns:createOrderResponse>
Basically I wanted to replace:
Namespace :http://beans.oms.ecom.shc.com/xsd
from
Namespace :http://services.oms.ecom.ecc.com
CAVEAT: The following is untested but demonstrates the ideas.
If you know which specific elements and attributes to expect, that's relatively easy -- it's the same as any other "when you see this element, output this other element instead" template, recursing to cover the whole document. Start with the identity stylesheet and add appropriate templates of the form:
<xsl:template match="wrongnamespace:fred"
xmlns:wrongnamespace="http://services.oms.ecom.ecc.com">
<rightnamespace:fred xmlns:rightnamespace="http://beans.oms.ecom.shc.com/xsd">
<xsl:apply-templates/>
</rightnamespace:fred>
</xsl:template>
and (for attributes)
<xsl:template match="#wrongnamespace:george"
xmlns:wrongnamespace="http://services.oms.ecom.ecc.com">
<xsl:attribute name="ns:george" value="."
namespace="http://beans.oms.ecom.shc.com/xsd"/>
</xsl:template>
... and so on.
If you don't know what elements or attributes to expect, this becomes uglier. In XSLT 1.0 and XPath 1.0 there's no easy way to say "any element node" (though I believe that was fixed in the 2.0 version of these specs). Sticking with 1.0 because it's more widely supported, the easiest solution is to use priorities: have a higher-priority template that catches the attributes, then have another template which handles other namespaced nodes (which would therefore be elements).
<xsl:template match="#*[namespace-uri(.)="http://services.oms.ecom.ecc.com"
priority="1">
<xsl:attribute name="concat('ns:',localname(.))" value="."
namespace="http://beans.oms.ecom.shc.com/xsd"/>
</xsl:template>
<xsl:template match="node()[namespace-uri(.)="http://services.oms.ecom.ecc.com"
priority="1">
<xsl:element name="concat('ns:',localname(.))"
namespace="http://beans.oms.ecom.shc.com/xsd">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
Again, adding these to the canonical identity stylesheet will do what you need; it'll catch and change these two cases while copying everything else unaltered.
CAVEAT: THIS WILL NOT CHANGE URIS APPEARING IN STRINGS. You could add a template which looks for text() nodes containing the old URI and outputs the new one instead; I'm leaving that as an exercise for the reader.
CAVEAT: THIS WILL NOT CHANGE NAMESPACE NODES. You may wind up with leftover declarations of the old prefix and namespace. Another template could be added to clean that up; this too is left as an exercise for the reader.
That should give you enough to get you started. Have fun. Programming in XSLT does require learning to think in terms of rules and replacements rather than procedures, but when you get used to that it can be a very expressive tool.
(Standard plug: If you want an off-the-shelf tested solution, you can probably find one on Dave Pawson's XSL FAQ website. That's a resource you really need to be aware of if working in XSLT; it'll save you a lot of work, and it has some solutions that are very clever and very non-obvious.)

Actually XSLT Lookup (Store variables during loop and use in it another template)

This question actually asks something quite different. See the comments to #Tomalak's answer to understand what the OP really wanted. :(
Is there a way to store a variable/param during a for-each loop in a sort of array, and use it in another template, namely <xsl:template match="Foundation.Core.Classifier.feature">.
All the classname values that appear during the for-each should be stored. How would you implement that in XSLT? Here's my current code.
<xsl:for-each select="Foundation.Core.Class">
<xsl:for-each select="Foundation.Core.ModelElement.name">
<xsl:param name="classname">
<xsl:value-of select="Foundation.Core.ModelElement.name"/>
</xsl:param>
</xsl:for-each>
<xsl:apply-templates select="Foundation.Core.Classifier.feature" />
</xsl:for-each>
Here's the template in which the classname parameters should be used.
<xsl:template match="Foundation.Core.Classifier.feature">
<xsl:for-each select="Foundation.Core.Attribute">
<owl:DatatypeProperty rdf:ID="{Foundation.Core.ModelElement.name}">
<rdfs:domain rdf:resource="$classname" />
</owl:DatatypeProperty>
</xsl:for-each>
</xsl:template>
The input file can be found at http://krisvandenbergh.be/uml_pricing.xml
No, it is not possible to store a variable in a for-each loop and use it later.
This is because variables are write-once in XSLT (once set they are immutable) and they are strictly scoped within their parent element. Once processing leaves the for-each loop, the variable is gone.
XSLT does not work as an imperative programming language, but that's what you seem to be trying here. You don't need <xsl:for-each> in 98% of all cases and should not use it because it clogs your view of how XSLT works. To improve your XSLT code, get rid of all <xsl:for-each> loops you have (all of them, I mean it) and use templates instead:
<xsl:template match="Foundation.Core.Class">
<xsl:apply-templates select="
Foundation.Core.Classifier.feature/Foundation.Core.Attribute
" />
</xsl:template>
<xsl:template match="Foundation.Core.Attribute">
<owl:DatatypeProperty rdf:ID="{Foundation.Core.ModelElement.name}">
<rdfs:domain rdf:resource="{
ancestor::Foundation.Core.Class[1]/Foundation.Core.ModelElement.name[1]
}" />
</owl:DatatypeProperty>
</xsl:template>
(I'm not sure if the above is what you actually want, your question is rather ambiguous.)
Note the use of the XPath ancestor axis to refer to an element higher in the hierarchy (you seem to want the <Foundation.Core.ModelElement.name> of the parent class).
PS: Your XML is incredibly bloated and strongly redundant due to structured element names. Structure should come from... well... structure, not from elements like <Foundation.Core.Classifier.feature>. I'm not sure if you can do anything about it, though.
Addition:
To solve your xmi.id / xmi.idref problem, the best way is to use an XSL key:
<!-- this indexes all elements by their #xmi.id attribute -->
<xsl:key name="kElementByIdref" match="*[#xmi.id]" use="#xmi.id" />
<!-- now you can do this -->
<xsl:template match="Foundation.Core.DataType">
<dataTypeName>
<!-- pull out the corresponding element from the key, output its value -->
<xsl:value-of select="key('kElementByIdref', #xmi.idref)" />
</dataTypeName>
</xsl:template>
To better understand how keys work internally, you can read this answer I gave earlier. Don't bother too much with the question, just read the lower part of my answer, I explained keys in terms of JavaScript.
Ok, I now understand why for-each is not always needed. Consider the code below.
<Foundation.Core.DataType xmi.id="UID71848B1D-2741-447E-BD3F-BD606B7FD29E">
<Foundation.Core.ModelElement.name>int</Foundation.Core.ModelElement.name>
</Foundation.Core.DataType>
It has an id UID71848B1D-2741-447E-BD3F-BD606B7FD29E
Way elsewhere I have the following.
<Foundation.Core.StructuralFeature.type>
<Foundation.Core.DataType xmi.idref="UID71848B1D-2741-447E-BD3F-BD606B7FD29E"/>
</Foundation.Core.StructuralFeature.type>
As you can see, both codes have the same ID. Now I want to output "int", everytime this ID appears somewhere in the document. So basically idref="UID71848B1D-2741-447E-BD3F-BD606B7FD29" should be replaced by int, which can be easily derived from the Foundation.Core.ModelElement.name element.
Above is the main reason why I would like to store it in a variable. I don't get it how this can be dealt with using XSLT. If someone could elaborate on this, I hope there exists some kind of pattern to solve such a problem, since I need it quite often. What would be a good approach?
I understand this is maybe a bit off-topic, but I am willing to ask it in this thread anyway since this problem is very close to it.

Varying xpath-default-namespace in XML source files

I have a set of XML files that I am processing with an XSL transform. They have a default namespace, so my XSL transform must contain the declaration:
xpath-default-namespace="urn:CZ-RVV-IS-VaV-XML-NS:data-1.2.2"
The problem is that this value changes from time to time, and my transform suddenly stops working, until I look at an example from the new file, extract this namespace ID and put it in the transform, whereby the transform stops working for old files. Is there a way to pass this as a parameter, or set it somehow at runtime? I have tried the parameter syntaxes that I looked up in various tutorials, but none have worked for this particular use.
I have searched all sorts of forums and found references to namespace-agnostic coding of XSL, but not figured out how to do it. Ian Williams' book "XSLT and Xpath" states that the default namespace must be declared, or you get nothing in the output stream, which is how it has worked for me. But I really don't want to have to change this by hand regularly, I want to give the user something that will work, without needing constant attention from me.
The only 100% reliable way I have invented so far is to use a standard programming language to open both the XML source and XSL transform as text files, extract the URI from the XML source, paste it into the XSL transform, close both files and then, finally run the actual transform. This works, but is incredibly dorky, at least to my taste. How can I better deal with changing default namespaces?
Pete
The value of xpath-default-namespace must be a static URI, so you'll have to pre-process the stylesheet if you want it to vary. One way to do that would be to use XSLT. Apply the following meta-stylesheet to your primary stylesheet each time, and then invoke the pre-processed result instead.
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- Pass in the new namespace URI as a stylesheet parameter -->
<xsl:param name="new-uri" required="yes"/>
<!-- By default, copy everything as is -->
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<!-- But update the value of #xpath-default-namespace -->
<xsl:template match="#xpath-default-namespace">
<xsl:attribute name="{name()}" namespace="{namespace-uri()}">
<xsl:value-of select="$new-uri"/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
This is a bit of a strange use case though, because namespaces weren't really designed to be so dynamic. They were designed to qualify names, i.e. make up part of a name. When you look at it that way, dynamic namespaces don't make a lot of sense. Imagine a database whose table and field names arbitrarily changed every once in a while, forcing you to rewrite all your SQL scripts to keep up with the changes. That's what this is akin to.
Have you tried defining a stylesheet parameter <xsl:param name="xpdn"/> and using it in the stylesheet declaration or top level template declaration as in
<xsl:template match="...." xpath-default-namespace="$xpdn">
I can't find anything in the spec that says this won't work (but I'm not in a position to try it just now).