Does LINQ to XML replace XSLT? - xslt

Is there anything you can do in XSLT that can't be done in LINQ to XML? Is it still important to learn XSLT? When would you choose one over the other?

Is there anything you can do in XSLT that can't be done in Linq to XML?
No, since LINQ to XML is an API used by Turing-complete programming languages, and covers more of XML Infoset than XSLT document model does (e.g. you can fully control the difference between text and CDATA nodes in L2X).
Is it still important to learn XSLT?
Depends on what you're doing. Broadly speaking, yes.
When would you choose one over the other?
XSLT is generally better when you need to do a transformation - i.e. both input and output is XML. There are a number of reasons for that. First of all, XSLT pattern matching is usually more concise than nested ?: in L2X queries, and far more readable. You can also use * to great effect to set up a default rule (like "copy everything", or "process children but do not generate output"), and then add rules for specific nodes you need to process in a special way - thus you do not need to write explicit loops/comprehensions for each node level in the document, as you often do in L2X. Finally, XPath is also more concise than L2X queries (at least in C#), so if you do a lot of non-trivial querying, it's likely to be far shorter and more readable in XSLT.
L2X is generally better when you need to quickly query a document for some value or node. The main advantage here is that there's less runtime overhead (XPath needs to be parsed, L2X query does not), and you don't need to mess with XmlNamespaceManager and other cruft - the API is streamlined for writing single-expression queries. As well, having nested from loops and let brings it closer to XQuery territory.
L2X is also the only choice when you need an in-place update of the document, and may be better when you only need to replace a few values in the document, and in-place update is an option - since XSLT doesn't let you touch the input in any way.

It is definitely still important to learn XSLT. LINQ to XML is great, but it's use is limited to .NET Apps.
XSLT can be applied across languages and platforms...even browsers can take XML and apply an XSLT to generate an output.
Don't forget that some .NET Application API's (CMS systems for example) still require you to supply XSLT to transform internal XML into an output. Ignoring the technology all together would be, in my opinion, a real mistake.

Not for anyone not using .NET

Related

Good Option for XML Edit/Replace

I have a huge (100k+ lines, 5MB+) XML which acts as a database for my C++ Application. The structure of the XML is quite straight forward, for example, it has chunks of:
<foo>
<bar prop="true"/>
<baz>blah</baz>
</foo>
The nesting of tags is several levels deep and there are many items with multiple properties. What is a good way to find and replace chunks of this kind of a file? For example, assume that the above section is repeated a few dozen times and in each chunk the value of the tag <baz> is different. I'd like to make edits such as:
Setting all the values contained in tag <baz> to a given value.
Remove chunks containing certain values
Etc.
So far, I've learnt of the following methods for accomplishing this:
Find/Replace: A no-brainer, trivial solution and also my last fall-back. This approach, IMHO is the most time consuming, error prone and painful method. The absolute last resort.
RegExes: Use regular expressions to match blocks of interest and edit them using replacement expressions. Kinda like this blog entry: http://blogs.msdn.com/b/vseditor/archive/2004/08/12/213770.aspx. But I feel this would be error prone and there could be a bunch of missed items if the regex is not exactly right the first time around.
Parser & Save: Whip up a quick program to parse the XML using Xerces or XML DOM Interfaces (or some other XML library), read the XML in, manipulate it as desired and save back to disk. Again, this approach is a slow process, but once its up and running, easy to make modifications and more flexible then RegExes.
Are there any better ways to deal with this?
(EDIT: Thanks for all the redo it to use a DB suggestions, I know its a huge mess but by "better ways to deal with this" I meant the "find/replace" part. )
If you don't want to put the entire document in memory, I would read it using a SAX parser. As you read it, you append the transformed document to a second (or a temp) file. I think it could be pretty fast, and use only a little memory footprint.
Are there any better ways to deal with this?
If you must use XML, you could use an XML database such as BDB XML (which has C++ APIs). It supports XQuery, transactions, etc.
Other options include TinyXML which I've used with success in the past. Quick and easy to use, not necessarily the fastest on a file of that size, but it will get the job done.
What are your actual memory constraints? 5MB is large but not enormous by current RAM standards.
I would use DOM with XPath if you can, it will be a lot less development work than SAX or other stream-based parsing. My problem with SAX is that if you are really using this as a in-memory DB, that implies random access on-demand and SAX is not well-suited for that - you will have to parse and reserialize over and over, whereas once you have the DOM at least you can play with it as you like.
Echo comments about to store in-RAM database info too. Plenty of alternatives that are better suited to this than XML. Maybe you could implement a tactical solution using DOM/XPath and investigate rip-and-replace as a longer-term project.

differences between for-each and templates in xsl?

Both xsl:for-each and xsl:template are used to retrieve nodes from xml in an xsl stylesheet. But what is the fundamental difference between them? Please guide me. Thanks in advance.
I generally agree with the other answers, but I will say that in my experience, a stylesheet written with xsl:for-each can be a lot easier to read, understand, and maintain than one that relies heavily xsl:apply-templates... Especially xsl:apply-templates with an implicit select (or a very generic select like select="node()").
Why? Because it's very easy to see what a for-each will do. With apply-templates, you in essence have to (a) know about all possible XML inputs (which will be easier if you have a schema, but then you still have to digest the schema; and many times you don't have a schema, especially for transient intermediate XML data sent on one stage of a pipeline; and even if you have a schema, your development framework (such as an ESB or CMS) may not give you a way to validate your XML at every point your pipelines. So if invalid data creeps in you will not be notified right away), so you can predict what kinds of nodes will be selected (e.g. children of the context node); and (b) look at every template in the stylesheet to see which template matches those nodes with highest priority (and last in document order). The order of processing may also skip all over the files, or over different files (imported or included). This can make it very difficult to "see" what's going on.
Whereas with a for-each, you know exactly which code will get instantiated: the code inside the for-each. And since for-each requires an explicit select expression, you're more likely to have a narrower field to guess from regarding what nodes can be matched.
Now I'm not denying that apply-templates is much more powerful and flexible than for-each. That's exactly the point: constructs that are more powerful and flexible, are also harder to constrain, understand, and debug (and prevent security holes in). It's the Rule of Least Power: "Powerful languages (or in this case, constructs) inhibit information reuse." (Also discussed here.)
When you use apply-templates, each template is more modular and therefore more reusable in itself, but the stylesheet is more complex and the interaction between templates is less
predictable. When you use for-each, the flow of processing is easy to predict and see.
With <xsl:apply-templates />, (or with <xsl:for-each select="node()"/>), when structure of the input XML changes, the behavior of the stylesheet changes, without the developer's review. Whether this is good or bad depends on how much forethought you've put into your stylesheet, and how much good communication there is between the XML schema developer and the stylesheet developer (who may be the same person or may belong to different organizations).
So to me, it's a judgment call. If you have document-oriented XML, like HTML, where lots of the element types really can have many different types of children, in an arbitrary-depth hierarchy, and the processing of a given element type doesn't depend very often on its context, then apply-templates is absolutely essential. On the other hand if you have "data-oriented" XML, with a predictable structure, where you don't often have the same element type meaning the same thing in different contexts, for-each can be much more straightforward to read and debug (and therefore to write correctly and quickly).
I think this has some what to do with understanding push vs. pull style processing than just comparing xsl:for-each or xsl:template match="...". You often see programmers from another discipline using a lot of xsl:if, xsl:choose and for-loops when the problem could have been solved in a more elegant XSLTish way.
But to the question: In my opinion, if you consider using xsl:for-each instead of processing the data with xsl:apply-templates you need to rethink. There are cases where a for-loop is suitable in XSLT, but whenever a matching template would do the same, templates are the way to go. In my experience, you can usually do most xsl:for-each with an xsl:apply-templates instead.
Some benefits as I see it of using matching templates over a for-loop are:
The stylesheets are easier to maintain and extend especially if source data changes.
As #chiborg mentions, templates can be reused since they are not built into a specific template. Together with xsl:next-match in XSLT 2.0 you can chain templates together in powerful ways.
You don't have to mimic behavior already built in to all XSLT processors, that is; use xsl:apply-templates and let the processor work for you.
Also, I find it easier to understand and debug a push style stylesheet. If you divide your stylesheet info small templates which do one or a few things and write specific matching patterns it's easy to see which template is doing what and trace the source of the problem.
Both the 'for-each' and 'template' are
used to retrieve the nodes from xml in
the xsl. But what is the difference
between them in basically
Here are some of the most important differences:
xsl:apply-templates is much richer and deeper than xsl:for-each, even
simply because we don't know what code will be applied on the nodes of
the selection -- in the general case this code will be different for
different nodes of the node-list.
The code that will be applied
can be written way after the xsl:apply templates was written and by
people that do not know the original author.
The FXSL library's implementation of higher-order functions (HOF) in XSLT wouldn't be possible if XSLT didn't have the <xsl:apply-templates> instruction.
Summary: Templates and the <xsl:apply-templates> instruction is how XSLT implements and deals with polymorphism.
Reference: See this whole thread: http://www.stylusstudio.com/xsllist/200411/post60540.html
It doesn't really matter, but you may want to think about it for the following thumb of rules that I found:
If the code depends on the context
position (position()), put it in a
<xsl:for-each>.
If the code depends on the context
node (. or any location path), put it
in a matching template.
Otherwise, use a named template.
Reference and read more at:
http://www.jenitennison.com/blog/node/9
These are to complete different XSLT instructions.
More than push vs. pull style, this is more like iteration vs. recursion.
xsl:for-each is an iterator instruction with all the benefits and constrains of iteration in a stateless declarative paradigm: a good processor should not polute the call stack.
xsl:apply-templates is a general recursion instruction. General in the sense that it's more powerful than xsl:call-template: you "throw" the selected nodes to the pattern matching mechanism, a truly "dynamic function invocation".
for-each can only be used inside one place in your template. Templates can be re-used with different apply-templates calls. I mostly use templates instead of for-each because of the added flexibility.
One use of for-each I haven't seen mentioned: you can use it to switch the context node to another document. I've used it to transform data XML to a HTML input form. The template that matched a data field contained a for-each that selected a single node: the xs:element in the XSD that described the data field to transform.
Using the description in the XSD one data field could be transformed to a radio button group, a drop down box, or a plain and simple text input. Without for-each I couldn't walk through the two documents at the same time.
In general, I prefer matching templates. I find that it corresponds to the notion of a single transform applied at once better than for-each-ing this node, then the next, then the one after that, etc.. But that's a personal preference, of course.

Two concepts from XSLT in other languages: apply-templates and xpath

Background: Having given up on the practical daily use of XSLT as a part of my programming toolkit, I was wondering if there were any implementations in other languages of the (only) two things I miss about that tool:
the ability to traverse data structures using "path" style statments via xpath
the ability to traverse template transformations using apply-templates instead of via an iterative or "looping" approach.
According to Google there are a couple of efforts out there to add "xpath-style" support to Javascript, but these have not apparently caught on very much. So far I haven't found anything where someone uses an "apply-templates" approach in another language
Question: Does anyone out there know of a programming language (hopefully one that is main-stream) that steals these two good ideas from XSLT, or applies the same or similar concepts using a different method?
the ability to traverse data structures using "path" style statments via xpath
I'm not aware of any other language that embeds XPath, but LINQ to XML is somewhat similar, particularly in its VB syntactic sugar incarnation. You could implement it in Common Lisp macros, or D templates, however.
the ability to traverse template transformations using apply-templates instead of via an iterative or "looping" approach.
No mainstream languages that I know of. Indeed, this feature is probably the main reason to use XSLT (and not e.g. XQuery, looking at closely related languages).
It's effectively extensible dynamic dispatch on receiver on arbitrary conditions - as such, I think you could probably do it in Common Lisp (CLOS, to be specific) - if I remember correctly, its multimethods can match arbitrary conditions, so if you have an XPath pattern evaluator, you could use it to emulate apply-templates, and even more - since apply-templates only dispatches on a single argument, while CLOS multimethods dispatch on multiple arguments.
XPath, while essential to making XSLT work, is independent of it; libraries like libxml give you it for free. The style of template application you describe is a little trickier; that's what you would normally use XSLT for.
Any programming language that does this should be functional. You could try writing your own, less-verbose, XSLT dialect; Perl also may give you enough rope to emulate this feature convincingly (although the performance implications are unclear).
The tough answer, though, is that this doesn't really exist, except as libraries for already existing languages.
For XPath, definitely. For C, there's Xalan-C++, for Java javax.xml.xpath (with multiple implementations), and C# has XPathNavigator and SelectNodes. If you want to use XPath for object hierarchies, look at JXPath.
For the template transformations, you should look at C#'s LINQ if you haven't already. It's not exactly the same thing, but it allows processing objects without explicit looping.
I have found nothing like that. But why would anybody use anything else to transform XML ? XSLT does a perfect job once you understand the non procedural way of developing solutions. Our applications are largely XSLT based and it is a really powerful tool.
A comment on your first requirement:
the ability to traverse data structures using "path" style statments via xpath
XPath makes a lot of assumptions on the data structure. If you're going to use it, you might as well convert your structure to XML because it's going to look like it anyway once you make it traversable via some XPath-like language unless you severely limit your XPath subset.
Also, keep in mind that the "only two things" that you are missing, XPath and template processing, are in-fact a huge part of what makes up Xslt. I'm curious why you decided to take it off of your tool-belt.
In spite of that fact that you wanted an Xslt alternative, I would still recommend Xslt and Xslt 2.0 in particular. With the addition of the unparsed-text and analyze-string you have a powerful text processing language. For example take a look at a CSV to XML stylesheet. Even though JSON isn't regular, you'd still be able to write a simple JSON to XML translator using recursive templates and transform the result at will.

What's so bad about xsl:for-each?

I hear time and time again about how you should avoid the use of XSLT for-each. That it's your inner imperative programming demon that should be banished.
What's so bad about it?
Does this best practice matter depending on the size of XML (i.e 100 vs 10,000 nodes)?
Essential difference between <xsl:apply-templates> and <xsl:-for-each> that nobody has pointed out:
<xsl:apply-templates> is really something much more than a nicer, more elegant equivalent of <xsl:for-each>:
xsl:apply-templates is much richer and deeper than xsl:for-each, even
simply because we don't know what code will be applied on the nodes of
the selection -- in the general case this code will be different for
different nodes of the node-list.
Also, the code that will be applied
can be written way after the xsl:apply templates was written and by
people that do not know the original author.
_2. On the other side, using <xsl:for-each> is in no way harmful if one knows exactly how an <xsl:for-each> is processed.
The trouble is that a lot of newcomers to XSLT that have experience in imperative programming take <xsl:for-each> as a substitute of a "loop" in their favorite PL and think that it allows them to perform the impossible -- like incrementing a counter or any other modification of an already defined <xsl:variable>.
One indispensable use of <xsl:for-each> is to change the current document -- this is often needed in order to be able to use the key() function on a document, different from the current source XML document, for example to efficiently access lookup-table that resides in its own xml document.
Templates tend to split up the code more nicely. Also, for-each loops suffer from the fact that people often come to them with the idea that they operate identically to the way for loops work in the major programming languages.
It is the use of a for-each to call templates that is discouraged, not the widespread use of for-each, per se. Even the Muenchian method of grouping relies on xsl:key constructs with xsl:for-each loops.
The idea of writing good XSLT is that the structure of your XML should dictate what templates are matched and then applied. Therefore, whenever possible, use apply-templates to select the nodes, rather than applying a select using for-each.
Quick answer: XSLT is largely functional in nature, and imperative loops are not very functional.
In general the based way to get the best out of XSLT is to use pattern matching as much as possible (xsl:apply-template rather than loops, ifs and call-template1).
In the end, it is all about style, and will make little difference in short pieces of XSLT, but being more functional will help with longer/complex problems.
1 Other than when creating functions which return values rather than modifying the output.
By using apply-templates, perhaps with a mode, makes it easier to include appropriate transformations of more kinds of elements that may be of interest at that location.
For example if you had XML containing a libraries contents, and you keep using for-each all over the place to select books, then you start keeping record of DVDs, CDs, Ginger Nut biscuits, etc, you will have to examine all the for-each sections to include more than just books. If you had used apply-templates, you may be able to simply create the new matches for the new elements.

Using XSLT to process business rules?

A coworker of mine mentioned that one use of XSLT is processing business rules. He mentioned that there were systems that allowed users to write business rules in some kind of text format, and then the program uses XSLT to process the text and apply the rules at run-time in the application.
Can someone shed some light on this subject for me?
Thanks!
Ouch. I wouldn't recommend that.
As the first responder said, XSL-T is for transforming XML. It's not a rules engine. I think it sounds like a misuse of the technology.
XSL-T transforms are not intuitive to write. If one of your goals for business rules is allowing business folks to update and maintain the rules, I can't imagine a more obtuse and difficult technology for doing so than XSL-T.
I suppose your colleague was refering to BPEL, the Business Process Execution Language. BPEL is an XML-based executable language for describing business processes.
Being an XML format, business rules may be generated or transformed using XSLT. However, I'm not familiar with BPEL so I don't know any system doing something like that.
Yes. The somewhat-like text format is called Excel, and users tend to do all kinds of complex things with it. The programmer then spends an awful lot of time trying to process it with every shiny new technology he can find, including XSLT, and finally decides to hand-code around all the inconsistencies. It is not fully automated, as no sane user trusts the programmer to get it right first time.
XSLT stands for XSL Transform. It is used to change an XML document from one form to another.
As for systems, Microsoft BizTalk uses XSLT in mapping operations that map one XML document into another. Within the XSLT the user can make use of .net code to do more complex processing.
I'm sure someone else will have a much nicer explanation but you can easily find out more by Googling XSLT tutorials. It's a huge topic.
It should be possible: write your rules in XML, the case data should also be in XML, and then a generic XSLT could be written that compares the case data against the rules and executes the relevant rules in the correct sequence.
The business users don't need to know XSLT, they just need to know how to write the rules.