XPath reference - xslt

I don't know the proper terminology for what I'm looking for, but what I am looking for is a complete reference the statements that can go between the double quotes, things like *, node(), #*, and all the ones listed here plus any others that exist.
<xsl:template match="*">
The answer I linked to provides some detail, but not enough. For instance, that answer says "can be applied to any element" about the example I gave above, but what is considered an "element" in Xpath? What does node() include? What statements include attributes? etc.
I have searched the references here and here and I'm slowly making my way through this book, but I'm not seeing the info I want, which is basically a consolidated (and hopefully exhaustive) list of statements and what they mean. Does such a list exist and if so where is it? Free is nice but not necessary.

In XSLT, the match pattern accepts a subset of XPath expressions. So the set of expressions which can appear as the value of a match attribute is governed by two specs: the XPath specification itself, which defines the language of which match-patterns are a subset, and the XSLT specification, which defines the subset.
If you are working with XSLT 1.0, the authoritative account is given by the XPath 1.0 specification and the XSLT 1.0 specification. It is in the nature of XPath that the language is infinite in size; there cannot be an exhaustive list of legal patterns. Instead, the set of legal patterns is defined by a context-free grammar given in the XSLT and XPath specs.
If you are working with XSLT 2.0, the relevant specs are XPath 2.0 (Second Edition) and XSLT 2.0. Again, the definition of legal match patterns uses a grammar defined partly in the XSLT spec and partly in the XPath spec.
You ask what is considered an "element" in Xpath? What does node() include? What statements include attributes? etc.
Both versions of XPath define how to evaluate expressions against instances of the XPath data model; it is the data model that specifies that all element nodes are nodes, but not all nodes are element nodes (and so on). The data model for XPath 1.0 is simpler and in general easier to understand, but its definition is rather informal and has what some readers regard as some problematic gaps and contradictions; it is defined in section 5 of the XPath 1.0 specification. The XPath 2.0 data model is used not only by XPath and XSLT but also by XQuery; it is defined in a spec called unsurprisingly XQuery 1.0 and XPath 2.0 Data Model (XDM).
A good book on XSLT will typically also provide a good account of the data model; depending on the style of the book, of course, it may be more or less exhaustive and be more or less careful about corner cases. There are several good books, and I have heard people say good things about Doug Tidwell's book. But the one XSLT book I have found on every serious XSLT programmer's shelf is the one written by Michael Kay. (Actually, most serious XSLT programmers I know own two: the XSLT 1.0 version and the XSLT 2.0 version.)
From the wording of your question, it sounds as if you may also want to read some systematic introductions to XML itself.

I'm reminded a little of the manager who asks you for plans to rid the world of cancer, and insists they must be presented tomorrow on a single sheet of A4 paper. You've discovered that you need more technical detail than the simpler "single-sheet" references provide, but you still value their simplicity: you are asking for completeness and brevity at the same time, and that's a tough order.
I think you're actually well on the way to answering your own question. You're discovered that you need a better understanding of the data model, as this underpins the semantics of all the XPath expressions and XSLT patterns that you need to write. As Michael Sperberg-McQueen points out, there's an admirably concise but lamentably informal description of the model in the XPath 1.0 specification, and an admirably detailed but lamentably verbose description in the XDM spec linked from XSLT 2.0 and XQuery 1.0. Equally, you've also discovered that any short reference to the XPath (or pattern) grammar and semantics is going to be incomplete, but any longer description is going to take time to absorb. So you know the choices you have to make!

An element is an XML structure like <Something>, an attribute looks like Something="value" and both can be referred as node.
I think a good reference is XPath specification itself. Takes a while to read it all and some more to understand it, but it's a nice place to pickup some terminology to formulate more specific questions.

Related

Upgrading XSLT 1.0 to XSLT 2.0

What is involved in upgrading from XSLT 1.0 to 2.0?
1 - What are the possible reasons for upgrading?
2 - What are the possible reasons for NOT upgrading?
3 - And finally, what are the steps to upgrading?
I'm hoping for an executive summary--the short version :)
What is involved in upgrading from XSLT 1.0 to 2.0?
1 - What are the possible reasons for upgrading?
If you are an XSLT programmer you'll benefit largely from the more convenient and expressive XSLT 2.0 language + XPath 2.0 and the new XDM (XPath Data Model).
You may want to watch this XSLT 2.0 Pluralsight course to get firm and systematic understanding of the power of XSLT 2.0.
You have:
Strong typing and all XSD types available.
The ability to define your own (schema) types.
the XPath 2.0 sequence type that doesn't have any counterpart (simply is missing) in XPath 1.0.
The ability to define and write functions in pure XSLT -- the xsl:function instruction.
Range variables in XPath expressions (the for clause).
Much better and more powerful string processing -- XPath 2.0 supports regular expressions in its tokenize(), matches() and replace() functions.
Much better and more powerful string processing -- XSLT 2.0 support for regular expressions -- the xsl:analyze-string, xsl:matching-substring and xsl:non-matching-substring new XSLT instructions.
More convenient, powerful and expressive grouping: the xsl:for-each-group instruction.
A lot of new, very powerful XPath 2.0 functions -- such as the functions on date, time and duration, just to name a few.
The new XPath operators intersect, except, is, >>, <<, some, every, instance of, castable as, ..., etc.
The general XPath operators >, <, etc. now work on any ordered value type (not only on numbers as in XPath 1.0).
New, safer value comparison operators: lt, le, eq, gt, ge, ne.
The XPath 2.0 to operator, allowing to have xsl:for-each select="1 to $N"
These, and many other improvements/new features significantly increase the productivity of any XSLT programmer, which allows XSLT 2.0 development to be finished in a small fraction of the time necessary for developing the same modules with XSLT 1.0.
Strong typing allows many errors to be caught at compile time and to be corrected immediately. For me this strong type-safety is the biggest advantage of using XSLT 2.0.
2 - What are the possible reasons for NOT upgrading?
It is often possible, reasonable and cost-efficient to leave existing, legacy XSLT 1.0 applications untouched and to continue using them with XSLT 1.0, while at the same time developing only new applications using XSLT 2.0.
Your management + any other non-technical reasons.
Having a lot of legacy XSLT 1.0 applications written in a poor style (e.g. using DOE or extension functions that now need to be re-written and the code refactored).
Not having available an XSLT 2.0 processor.
3 - And finally, what are the steps to upgrading?
Change the version attribute of the xsl:stylesheet or xsl:transform element from "1.0" to "2.0".
Remove any xxx:node-set() functions.
Remove any DOE.
Be ready for the surprise that xsl:value-of now outputs not just the first, but all items of a sequence.
Try to use the new xsl:sequence instruction as much as possible -- use it to replace any xsl:copy-of instructions; use it instead of xsl:value-of any time when the type of the output isn't string or text node.
Test extensively.
When the testing has verified that the code works as expected, start refactoring (if deemed necessary). It is a good idea to declare types for any variables, parameters, templates and functions. Doing so may reveal new, hidden errors and fixing them increases the quality of your code.
Optionally, decide which named templates to rewrite as xsl:function.
Decide if you still need some extension functions that are used in the old version, or you can rewrite them easily using the new, powerful capabilities of XSLT.
Final remarks: Not all of the above steps are necessary and one can stop and declare the migration successful on zero bug testing results. It is much cleaner to start using all XSLT 2.0/XPath 2.0 features in new projects.
Dimitre's answer is very comprehensive and 100% accurate (as always) but there is one point I would add. When upgrading to a 2.0 processor, you have a choice of leaving the version attribute set to "1.0" and running in "backwards compatibility mode", or changing the version attribute to "2.0". People often ask which approach is recommended.
My advice is, if you have a good set of tests for your stylesheets, take the plunge: set version="2.0", run the tests, and if there are any problems, fix them. Usually the problems will be code that was never quite right in the first place and only worked by accident. But if you don't have a good set of tests and are concerned about the reliability of your workload, then leaving version="1.0" is a lower-risk approach: the processor will then emulate all the quirks of XSLT 1.0, such as xsl:value-of ignoring all but the first item, and the strange rules for comparing numbers with strings.

differences between for-each and templates in xsl?

Both xsl:for-each and xsl:template are used to retrieve nodes from xml in an xsl stylesheet. But what is the fundamental difference between them? Please guide me. Thanks in advance.
I generally agree with the other answers, but I will say that in my experience, a stylesheet written with xsl:for-each can be a lot easier to read, understand, and maintain than one that relies heavily xsl:apply-templates... Especially xsl:apply-templates with an implicit select (or a very generic select like select="node()").
Why? Because it's very easy to see what a for-each will do. With apply-templates, you in essence have to (a) know about all possible XML inputs (which will be easier if you have a schema, but then you still have to digest the schema; and many times you don't have a schema, especially for transient intermediate XML data sent on one stage of a pipeline; and even if you have a schema, your development framework (such as an ESB or CMS) may not give you a way to validate your XML at every point your pipelines. So if invalid data creeps in you will not be notified right away), so you can predict what kinds of nodes will be selected (e.g. children of the context node); and (b) look at every template in the stylesheet to see which template matches those nodes with highest priority (and last in document order). The order of processing may also skip all over the files, or over different files (imported or included). This can make it very difficult to "see" what's going on.
Whereas with a for-each, you know exactly which code will get instantiated: the code inside the for-each. And since for-each requires an explicit select expression, you're more likely to have a narrower field to guess from regarding what nodes can be matched.
Now I'm not denying that apply-templates is much more powerful and flexible than for-each. That's exactly the point: constructs that are more powerful and flexible, are also harder to constrain, understand, and debug (and prevent security holes in). It's the Rule of Least Power: "Powerful languages (or in this case, constructs) inhibit information reuse." (Also discussed here.)
When you use apply-templates, each template is more modular and therefore more reusable in itself, but the stylesheet is more complex and the interaction between templates is less
predictable. When you use for-each, the flow of processing is easy to predict and see.
With <xsl:apply-templates />, (or with <xsl:for-each select="node()"/>), when structure of the input XML changes, the behavior of the stylesheet changes, without the developer's review. Whether this is good or bad depends on how much forethought you've put into your stylesheet, and how much good communication there is between the XML schema developer and the stylesheet developer (who may be the same person or may belong to different organizations).
So to me, it's a judgment call. If you have document-oriented XML, like HTML, where lots of the element types really can have many different types of children, in an arbitrary-depth hierarchy, and the processing of a given element type doesn't depend very often on its context, then apply-templates is absolutely essential. On the other hand if you have "data-oriented" XML, with a predictable structure, where you don't often have the same element type meaning the same thing in different contexts, for-each can be much more straightforward to read and debug (and therefore to write correctly and quickly).
I think this has some what to do with understanding push vs. pull style processing than just comparing xsl:for-each or xsl:template match="...". You often see programmers from another discipline using a lot of xsl:if, xsl:choose and for-loops when the problem could have been solved in a more elegant XSLTish way.
But to the question: In my opinion, if you consider using xsl:for-each instead of processing the data with xsl:apply-templates you need to rethink. There are cases where a for-loop is suitable in XSLT, but whenever a matching template would do the same, templates are the way to go. In my experience, you can usually do most xsl:for-each with an xsl:apply-templates instead.
Some benefits as I see it of using matching templates over a for-loop are:
The stylesheets are easier to maintain and extend especially if source data changes.
As #chiborg mentions, templates can be reused since they are not built into a specific template. Together with xsl:next-match in XSLT 2.0 you can chain templates together in powerful ways.
You don't have to mimic behavior already built in to all XSLT processors, that is; use xsl:apply-templates and let the processor work for you.
Also, I find it easier to understand and debug a push style stylesheet. If you divide your stylesheet info small templates which do one or a few things and write specific matching patterns it's easy to see which template is doing what and trace the source of the problem.
Both the 'for-each' and 'template' are
used to retrieve the nodes from xml in
the xsl. But what is the difference
between them in basically
Here are some of the most important differences:
xsl:apply-templates is much richer and deeper than xsl:for-each, even
simply because we don't know what code will be applied on the nodes of
the selection -- in the general case this code will be different for
different nodes of the node-list.
The code that will be applied
can be written way after the xsl:apply templates was written and by
people that do not know the original author.
The FXSL library's implementation of higher-order functions (HOF) in XSLT wouldn't be possible if XSLT didn't have the <xsl:apply-templates> instruction.
Summary: Templates and the <xsl:apply-templates> instruction is how XSLT implements and deals with polymorphism.
Reference: See this whole thread: http://www.stylusstudio.com/xsllist/200411/post60540.html
It doesn't really matter, but you may want to think about it for the following thumb of rules that I found:
If the code depends on the context
position (position()), put it in a
<xsl:for-each>.
If the code depends on the context
node (. or any location path), put it
in a matching template.
Otherwise, use a named template.
Reference and read more at:
http://www.jenitennison.com/blog/node/9
These are to complete different XSLT instructions.
More than push vs. pull style, this is more like iteration vs. recursion.
xsl:for-each is an iterator instruction with all the benefits and constrains of iteration in a stateless declarative paradigm: a good processor should not polute the call stack.
xsl:apply-templates is a general recursion instruction. General in the sense that it's more powerful than xsl:call-template: you "throw" the selected nodes to the pattern matching mechanism, a truly "dynamic function invocation".
for-each can only be used inside one place in your template. Templates can be re-used with different apply-templates calls. I mostly use templates instead of for-each because of the added flexibility.
One use of for-each I haven't seen mentioned: you can use it to switch the context node to another document. I've used it to transform data XML to a HTML input form. The template that matched a data field contained a for-each that selected a single node: the xs:element in the XSD that described the data field to transform.
Using the description in the XSD one data field could be transformed to a radio button group, a drop down box, or a plain and simple text input. Without for-each I couldn't walk through the two documents at the same time.
In general, I prefer matching templates. I find that it corresponds to the notion of a single transform applied at once better than for-each-ing this node, then the next, then the one after that, etc.. But that's a personal preference, of course.

Does LINQ to XML replace XSLT?

Is there anything you can do in XSLT that can't be done in LINQ to XML? Is it still important to learn XSLT? When would you choose one over the other?
Is there anything you can do in XSLT that can't be done in Linq to XML?
No, since LINQ to XML is an API used by Turing-complete programming languages, and covers more of XML Infoset than XSLT document model does (e.g. you can fully control the difference between text and CDATA nodes in L2X).
Is it still important to learn XSLT?
Depends on what you're doing. Broadly speaking, yes.
When would you choose one over the other?
XSLT is generally better when you need to do a transformation - i.e. both input and output is XML. There are a number of reasons for that. First of all, XSLT pattern matching is usually more concise than nested ?: in L2X queries, and far more readable. You can also use * to great effect to set up a default rule (like "copy everything", or "process children but do not generate output"), and then add rules for specific nodes you need to process in a special way - thus you do not need to write explicit loops/comprehensions for each node level in the document, as you often do in L2X. Finally, XPath is also more concise than L2X queries (at least in C#), so if you do a lot of non-trivial querying, it's likely to be far shorter and more readable in XSLT.
L2X is generally better when you need to quickly query a document for some value or node. The main advantage here is that there's less runtime overhead (XPath needs to be parsed, L2X query does not), and you don't need to mess with XmlNamespaceManager and other cruft - the API is streamlined for writing single-expression queries. As well, having nested from loops and let brings it closer to XQuery territory.
L2X is also the only choice when you need an in-place update of the document, and may be better when you only need to replace a few values in the document, and in-place update is an option - since XSLT doesn't let you touch the input in any way.
It is definitely still important to learn XSLT. LINQ to XML is great, but it's use is limited to .NET Apps.
XSLT can be applied across languages and platforms...even browsers can take XML and apply an XSLT to generate an output.
Don't forget that some .NET Application API's (CMS systems for example) still require you to supply XSLT to transform internal XML into an output. Ignoring the technology all together would be, in my opinion, a real mistake.
Not for anyone not using .NET

Two concepts from XSLT in other languages: apply-templates and xpath

Background: Having given up on the practical daily use of XSLT as a part of my programming toolkit, I was wondering if there were any implementations in other languages of the (only) two things I miss about that tool:
the ability to traverse data structures using "path" style statments via xpath
the ability to traverse template transformations using apply-templates instead of via an iterative or "looping" approach.
According to Google there are a couple of efforts out there to add "xpath-style" support to Javascript, but these have not apparently caught on very much. So far I haven't found anything where someone uses an "apply-templates" approach in another language
Question: Does anyone out there know of a programming language (hopefully one that is main-stream) that steals these two good ideas from XSLT, or applies the same or similar concepts using a different method?
the ability to traverse data structures using "path" style statments via xpath
I'm not aware of any other language that embeds XPath, but LINQ to XML is somewhat similar, particularly in its VB syntactic sugar incarnation. You could implement it in Common Lisp macros, or D templates, however.
the ability to traverse template transformations using apply-templates instead of via an iterative or "looping" approach.
No mainstream languages that I know of. Indeed, this feature is probably the main reason to use XSLT (and not e.g. XQuery, looking at closely related languages).
It's effectively extensible dynamic dispatch on receiver on arbitrary conditions - as such, I think you could probably do it in Common Lisp (CLOS, to be specific) - if I remember correctly, its multimethods can match arbitrary conditions, so if you have an XPath pattern evaluator, you could use it to emulate apply-templates, and even more - since apply-templates only dispatches on a single argument, while CLOS multimethods dispatch on multiple arguments.
XPath, while essential to making XSLT work, is independent of it; libraries like libxml give you it for free. The style of template application you describe is a little trickier; that's what you would normally use XSLT for.
Any programming language that does this should be functional. You could try writing your own, less-verbose, XSLT dialect; Perl also may give you enough rope to emulate this feature convincingly (although the performance implications are unclear).
The tough answer, though, is that this doesn't really exist, except as libraries for already existing languages.
For XPath, definitely. For C, there's Xalan-C++, for Java javax.xml.xpath (with multiple implementations), and C# has XPathNavigator and SelectNodes. If you want to use XPath for object hierarchies, look at JXPath.
For the template transformations, you should look at C#'s LINQ if you haven't already. It's not exactly the same thing, but it allows processing objects without explicit looping.
I have found nothing like that. But why would anybody use anything else to transform XML ? XSLT does a perfect job once you understand the non procedural way of developing solutions. Our applications are largely XSLT based and it is a really powerful tool.
A comment on your first requirement:
the ability to traverse data structures using "path" style statments via xpath
XPath makes a lot of assumptions on the data structure. If you're going to use it, you might as well convert your structure to XML because it's going to look like it anyway once you make it traversable via some XPath-like language unless you severely limit your XPath subset.
Also, keep in mind that the "only two things" that you are missing, XPath and template processing, are in-fact a huge part of what makes up Xslt. I'm curious why you decided to take it off of your tool-belt.
In spite of that fact that you wanted an Xslt alternative, I would still recommend Xslt and Xslt 2.0 in particular. With the addition of the unparsed-text and analyze-string you have a powerful text processing language. For example take a look at a CSV to XML stylesheet. Even though JSON isn't regular, you'd still be able to write a simple JSON to XML translator using recursive templates and transform the result at will.

Is XSLT a functional programming language?

Several questions about functional programming languages have got me thinking about whether XSLT is a functional programming language. If not, what features are missing? Has XSLT 2.0 shortened or closed the gap?
XSLT is declarative as opposed to stateful.
Although XSLT is based on functional programming ideas, it is not a full functional programming language, it lacks the ability to treat functions as a first class data type. It has elements like lazy evaluation to reduce unneeded evaluation and also the absence of explicit loops.
Like a functional language though, I would think that it can be nicely parallelized with automatic safe multi threading across several processors.
From Wikipedia on XSLT:
As a language, XSLT is influenced by
functional languages, and by
text-based pattern matching languages
like SNOBOL and awk. Its most direct
predecessor was DSSSL, a language that
performed the same function for SGML
that XSLT performs for XML. XSLT can
also be considered as a template
processor.
Here is a great site on using XSLT as a functional language with the help of FXSL. FXSL is a library that implements support for higher-order functions.
Because of FXSL I don't think that XSLT has a need to be fully functional itself. Perhaps FXSL will be included as a W3C standard in the future, but I have no evidence of this.
I am sure you guys have found this link by now :-) http://fxsl.sourceforge.net/articles/FuncProg/Functional%20Programming.html .
Well functions in XSLT are first class-citizens with some work arounds after all :-)
That is sort of how it feels when I am programming it.
XSLT is entirely based on defining functions and applying them to selected events that come down the input stream.
XSLT lets you set a variable. Functional programming does not allow functions to have side effects - and that is a biggie.
Still, writing in XSLT, one has the same "feel as working in an FP fashion. You are working with input - you are not changing it - to create output.
This is a very, very different programming model from that used when working with the DOM API. DOM does not separate input and output at all. You are handed a data structure - and you mangle it how you see fit - without hesitation, restriction, or remorse.
Suffice it to say if you like FP and the principles behind it, you will probably feel comfortable working in it. Just like experience with event driven programming - and XML itself - will make you comfortable with it as well.
If your only experience is with top-down, non event driven programs - then XSLT will be very unfamiliar, alien landscape indeed. At least at first. Growing a little experience and then coming back to XSLT when XPath expressions and event-handling are really comfortable to you will pay off handsomely.
For the most part, what makes XSLT not a 100% functional programming language is it's inability to treat functions as a first-class data type.
There may be some others -- but that's the obvious answer.
Good luck!
Saxon-SA has introduced some extension functions which make XSLT functional. You can use saxon:function() to create a function value (actually a {http://net.sf.saxon/java-type}net.sf.saxon.expr.UserFunctionCall value) which you then call with saxon:call().
Saxon-B has similar functionality with the pairing of saxon:expression() and saxon:eval(). The difference is that saxon:expression() takes any XPath expression, and saxon:eval() evaluates it, whereas saxon:function() takes the name of a function which saxon:call() calls.
That is not really an argument, since you can only declare variables, not change their values after declaration. In that sense it is declarative not imperative style, as stated in Mr Novatchev's article.
Functional programming languages like Scheme or Erlang enable you to declare variables as well, and in Haskell you can also do that:
-- function 'test' takes variable x and adds it on every element of list xs
test :: [Int] -> [Int]
test xs = map (+ x) xs
where x = 2