Why does XSLT seem to irritate so many people? [closed] - xslt

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 months ago.
Improve this question
What it is about XSLT that people find irritating? Is it the syntax (which is pretty unusual) or just the way XSLT works in general? Are there features that are lacking?
I did a little bit of XSLT (around 800 lines) a while ago and found it not that bad. So why the general animosity against it?

I think people find it difficult to get their heads around XSLT (and bitch about it) because it is functional and declarative in nature, unlike c# or java programming. Navigating around documents can end up being complicated when XPATH statements get clever - though this is a feature of XPATH rather than XSLT. XPATH typically gets complex when you don't know at design time the exact structure of a document so you start querying siblings, descendents and ancestors. This is when people inheriting a complex XSLT start considering career changes!
With XSLT it is very much 'right tool for the right job'. It is designed to transform an xml document into another xml document extremely quickly and efficiently. XSLT is almost certainly the best tool to use for this purpose because of its extensibility, the fact that it has been written for this purpose, widespread support for it in xml processors across the board, and in case i didnt mention it already, performance. Common use-cases:
converting an xml document purely containing data into a document exposing a user-interface such as an xhtml document
converting an xml document into a different structure to suit someone elses schema e.g. Biz2Biz communications
A great implementation of the xslt technology is the apache-cocoon project which transforms xml documents into multiple output formats including html, excel, chart images, pdf's with an extensible plugin architecture. We use it a lot for our reporting platform and it works very well. When developers start with it, they find the same familiar issues. Once they get over them, they would typically be writing what i am here.
I once worked with a guy who didnt want to work with (and learn) XSLT and ended up presenting a demo to the client which took over 20 seconds to render a page. When i finally persuaded him to use an XSLT transform instead of his dumb DOM code it took under a second.

I like xslt, and use it quite a bit. As long as you think in terms of functional programming (i.e. set-once variables, similar to F# etc), then it is hugely versatile. I use it regularly for data transformation, presentation (in particular [x]html), and versatile code generation.
Definitely highly programming related; nobody except a programmer would grok it - but a very powerful tool.
I have a few xslt (split over a few xsl:import/xsl:include files) that is substantially more than the 800 you mention in the post... it really can (when used correctly) be a fully featured environment.
Notes:
best used at the server; client-side support is hit'n'miss
a few key things missed in 1.0; regex; case-insensitivity; etc
can be tricky if whitespace is important
One particularly useful feature of xslt (as a separate file) is that it makes it possible to change the transform without rebuilding any code. The code-gen example is from an open source project I run; I know of several users who have dipped in and tweaked the code-gen for their local standards. One use even went as far as writing the transform for an entire second language - and all without touching the binaries.

I personally dislike XSLT because it seems to combine several things that are generrally disliked in the developer community:
it uses magic strings (XPATH) that look like noise aka perl reg exs.
xml tags which can make statements verbose - aka xml programming language.

I've worked with XSLT before and I didn't much care for it because I found it extremely verbose for the simple task I wanted to perform.
Just out of curiosity, what did your 800 lines of XSLT do?

XSLT is a really powerful tool in the developer arsenal. I use it all the time for code generation. Performance counters, data access layer, REST interfaces, you name it. anything repetitive.
As a language it sure has its quirks, but as a tool is invaluable.

Many programmers don't have any experience with Functional Programming. XSLT, in many ways, resembles Functional Programming and a new and foreign paradigm to learn.
Learning an unfamiliar programming paradigm can be challenging, let alone learning an unfamiliar programming paradigm expressed in XML.
Code written in a Functional Programming language is typically minimalistic. XML is rarely minimalistic. So folks who know Functional Programming and appreciate its minimalism have to give up that minimalism.

I personally think it is very suitable for certain types of programming problems. To me, in certain situations, it is much easier to maintain a form using XSLT versus having to rewrite/recompile/redeploy code changes. While XSLT is not the only way to accomplish that, I haven't found any other solutions for those cases that is much cleaner and easier.
It has its place. Like everything else, when misused, it becomes a garbled mess of code, just as any language would. When used correctly, it can be a good supplement or solution to a programming problem.

XSLT is very powerful, so long as what you want to do with it matches what it's good for. However, maintaining someone else's XSLT can be a bit daunting. It's a programming language but it's also an XML file, so it can be hard to understand, even when laid out cleanly and adequately commented.

Our Library CMS largely consists of html stylesheets to do almost everything. Our data is XML natively of course. Some of our programmers don't get the functional programming paradigm. Your first experiences might lead to complex templates misusing the iterative features of XSLT. The first thing you have to tell a programmer is not to use the for each statement or travel the xpath axes
If they learn to refrain they may learn to understand the concepts of templates.

I find that the people that complain about XSLT are the ones that misuse it. For example, I think using it as an HTML templating language for a CMS is a terrible idea, unless your data is in XML already. Those people might complain that XSLT is ugly, or verbose, or whatever, but that's because they are using it for the wrong reasons.

XSLT is both functional and imperative at the same time. This trips up a lot of people. they have match and for loops with variables.
It is easy to write bad code in it. But if you follow good patterns you can do some really neat things very easily.
Check out http://www.worldofwarcraft.com/index.xml and http://www.wowarmory.com/index.xml if you have an XSLT-capable browser (FF 3 is good). They are totally written in client side XSLT with underlying XML. It makes scraping those sites REALLY easy and nice and they are forced to keep the data and presentation separate. A great example is their character pages http://www.wowarmory.com/character-achievements.xml?r=Mal%27Ganis&cn=Vosk&gn=Juggernaut

It's an example of turning XML into a programming language. Yuck. I wish people wouldn't do that. We have perfectly good programming languages already, and they are far better at it than XML.

because MS doesn't implement exslt2

Related

Cleanest data structure to use when interpreting data from neatly-structured user commands (in C++) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I would like to write a simple in-house program that parses user commands written in a language of our team's own invention (but based closely on another program we are already familiar with). The command parser that I am working on now will simply be the UI through which the user can run the other algorithms I have already written. (Those other algorithms, by the way, are used to generate the input files for a molecular dynamic simulation package called LAMMPS.) The only thing I really have left to do is just write this UI, but as it turns out, writing your own scripting language is almost an intractable challenge for a non software engineer to tackle on his own.
According to the answers I received, what I am try to make would be considered a Domain Specific Language, and it is not advisable to try to make one's own DSL due to the enormous amount of work required to make it useful and bug-free.
The best option then would actually be to use an existing scripting language like Lua or Python, and embed it in the program.
To do this, I will most likely use Lua because it seems most fitting for our needs. So at this point, the rest of this question is no longer relevant since the answer would be: "Don't do it yourself." But I'm still going to keep part of it here for other users to be able read and learn from the wonderful answers below.
Thanks again to everyone who replied!
Old Question:
I would like to write a program that parses a user text input and then
runs a function corresponding to that input. To do this I would need
to parse the string for relevant keywords. I believe there will be
less than 15 keywords when I'm done, so ideally I'd like this code
to be simple and short.
The problem is that I am currently using if-statements to parse the
strings. This is an extremely inconvenient way to parse commands
because even for a short 3 word commands the code explodes into nested-ifs
3 layers deep. So longer 8+ word sentences will become nested-ifs more than
8 layers deep.
This kind of programing approach quickly becomes unmanageable, especially
when I need to make any significant changes to a command.
My question is whether or not there exists a data structure in C++ that
can help me better manage my giant nested-ifs, or if anyone could suggest
a better way to parse a string for lots of different data types (i.e.
substings, ints, and floats) and output an error message when the expected
type is not found?
Here is an example of a short user session to show the kinds of commands
I would like to interpret:
load "Basis.Silicon" as material 1
add material 1 to layer 1
rotate layer 1 about x-axis by 45 degrees
translate layer 1 in x-axis by 10 nm
generate crystal
These commands are based on an already-existing program that our team
uses, but unfortunately the source code for this program has never been
publicly released so I am left guessing as to how it was actually
implemented.
One final note, unlike natural language processors, I know exactly what
the format of each line will be. So my issue isn't so much how to interpret
the text, but rather how to code the logic in a concise and manageable way.
Thanks everyone!
Your question is not clear. And your goals are more difficult than what you believe.
Either you consider that you want to somehow process human language sentences (e.g. in English). Then you want to study natural language processing, and you can find some libraries related to that field.
Or you consider that you want to interpret some formal programming or scripting language. Then you want to study interpreters and compilers. BTW, in that case, you might just embed an existing interpreter (like Lua, Guile, Python, etc....) in your program.
You could also think in terms of expert systems with a knowledge base made of rules (this approach could be viewed as in the middle between NLP and scripting language) You'll then need some inference engine (perhaps CLIPS). See also J.Pitrat's blog.
Notice that even coding a simple interpreter is more difficult than you believe. You absolutely need to represent abstract syntax trees, which you construct from textual input with a parsing phase.
BTW, All of NLP, expert systems, and interpreter design and implementation are difficult fields. You could get a PhD in all 3 fields (but you have to choose which).
If you go the embedded interpreter way: study the interpreters I mentioned (Guile, Lua, Python, Neko, etc...) and choose which one you want, to embed.
If for whatever reason, you want to make an interpreter from scratch: Learn several programming languages first (including scripting languages like Ruby, Python, Ocaml, Scheme, Lua, Neko, ...). Read books on Programming Language Pragmatics (by M.Scott) and Lisp In Small Pieces (by Queinnec). Read also text books on compilation and parsing, and on Garbage Collection and formal (e.g. denotational) semantics. All this may need a dozen years of work.
Notice that by experience embedding a software in an interpreter is a very structuring design. If you did not thought of that at the beginning you probably need to redesign and refactor a lot your existing application. For instance, when embedding a software in an interpreter, you cannot afford that bad input crashes the program. So error handling and memory management (interfacing to the GC of the interpreter) is challenging and gives new constraints. Hence you'll need to re-think your application.
If all this is new (and even if you don't choose e.g. Guile as the embedding interpreter): learn and practice a bit of Scheme -e.g. with Guile or PltScheme- (e.g. reading SICP), read a little bit about λ-calculus and closures, then read Queinnec's Lisp In Small Pieces book. Remember the halting problem (which is partly why interpreters are difficult to code).
BTW the syntax you are proposing (e.g. rotate mat 1 by x 90) is not very readable and looks COBOL-like. If possible, have a language which looks familiar to existing ones. Make it easy to read !
Start by reading all the wikipages I am referencing here.
FWIW, I am the main author of MELT, a domain specific language (inspired a lot by Scheme) to extend the GCC compiler. Some of the papers / documentations I wrote might inspire you (and contain valuable references).
Addenda (after question was reformulated)
You seems to invent some formal syntax like
add material 1 to layer 1
rotate layer 1 about x-axis by 90 degrees
translate layer 1 in x-axis by 10 inches
I can't guess what kind of language is it? Are you implementing a 3D printer? If yes, you should stick to some existing standard formal language in that domain.
I believe that such a COBOL-like syntax is really wrong. The point is that it is too verbose, and that you are wishing to implement some domain specific language. I find your example very bad-looking.
Is that syntax your invention, or is there some document specifying (and many thousands already existing lines coded in) your domain specific language. If you are just inventing it, please reconsider the syntax and the semantics.
First, you need to specify on paper the full syntax and semantics of your DSL.
Is your DSL Turing complete? (I guess that yes, because Turing completeness is reached very quickly - e.g. with variables and loops....). If yes, you are inventing a scripting language. Please don't invent scripting language without knowing several programming & scripting languages (then read Programming Language Pragmatics...). The point is that, if your scripting language will become successful, advanced users will soon or later write important programs in it (e.g. many thousand lines). Then, these advanced users will be programmers. In that case, it is very important (for social & economic reasons) to have a DSL well founded and looking familiar (if possible, an extension of some existing scripting language).
If your DSL already exists, stick to its specification on paper. If that specification is not good enough, improve it with formalization (e.g. by writing some BNF syntax, and some formal (e.g. denotational) semantics for it). Publish and discuss that formalization with existing users.
Several industries got some ad-hoc DSLs which became widely used but was ill designed
(e.g., in the French nuclear industry, the Gibiane DSL designed in the 1970s by nuclear physicists, not computer scientists; the US Boeing corporation is also rumored to have made similar mistakes). Then, maintaining and improving the many hundred thousands lines of DSL scripts is becoming a nightmare (and may means losing millions of dollars or euros). So you better stick to some existing scripting language. The advantages are that there exist some culture on it (e.g. you can find dozens of books on Python or Lua, and many trained engineers familiar with them), that the interpreter is widely used and tested, that the community working on them is improving the interpreters, so it has quite few uncorrected bugs.
You should not attempt to design and implement your own DSL if you are not a trained computer scientist. Stick to some existing scripting language (of course their syntax is not like you want it to be), and leverage on existing implementations and experiment.
As a counter-example, J.Ousterhout has invented the widely used Tcl scripting language, with the claim that scripts are always small (e.g. hundreds of line only) and won't grow to big code base; unfortunately, some of them did, and Tcl is known as a bad language to code many dozens of thousands of lines (even if Tcl is an easy and convenient language for tiny scripts). The moral of the story is that if a (turing complete) scripting language is becoming successful, some "crazy" advanced user will code hundred of thousands of script code. So you need that scripting language to be well designed from the start. Hence, you should adopt and adapt a good existing scripting language (and avoid inventing an unfamiliar syntax without having a good knowledge of several existing scripting languages)
later additions
PS: my criticism of Tcl is not entirely subjective: the point is that Tcl was designed for small scripts in mind (read J.Ousterhout's first papers about Tcl), but my point is that when you offer a Turing-complete scripting language, some "crazy" user will eventually write huge scripts for it. Hence, you need to anticipate such "crazy" usage by offering a scripting language which "scales up" to big scripts, so is built according to software engineering practices for large software code base.
NB. Lua is probably a good choice as a language to embed. It is small, has a nice implementation, is well documented, and has good performance. But be careful about memory management issues (and this advice holds for any scripting language).
EDIT: To be more clear, I would like to have a short list of key words
(<15). The order/presence of which would determine which function will
be run.
You can build a small ruleset engine (e.g. something that processes lists of words). You write that engine/function once and just pass the data structures to it.
As an alternative, a solution using regular expressions would be probably the fastest to code (the engine is ready for you), assuming you're familiar with the regexp syntax (if not, it's still a good investment).
You could build a table of keywords and function pointers:
typedef void (*Function_Pointer)(void);
struct table_entry
{
const char * keyword;
Function_Pointer p_function;
};
table_entry function_table[] =
{
{"car", Process_Car},
{"bike", Process_Bike},
};
Search the table for a keyword. If the keyword is found, dereference the function pointer.
The following snippet will execute the function for processing the word "car":
(function_table[0].p_function)();
There is a famous program, called Eliza, which parses sentences for keywords.
Examples can be found at: Eliza C++ examples

XSLT XPath style guide / best practice / coding standard?

Does there exist an XSLT / XPath style guide / coding standard / best practice reference?
In particular I'm maintaining a bunch of XSLT scripts which are demonstrably fragile and unmaintainable.
eg. Adding a single level of nesting to the XML requires hundreds of changes to the scripts, even on templates that are operating on subtree fragments that are unchanged.
In a procedural language there is a well establish literature of Object Oriented Design Principles (SOLID, LSP, ....) and Coding Standards. (Don't use global variables, reduce coupling, improve cohesion, encapsulate state...)
Where do I find the equivalent for XSLT?
There's no single compact document of the kind you are looking for. Any book on XSLT is likely to be packed with advice, but most of it is of the kind that a good programmer will do anyway. You can't turn a bad programmer into a good programmer by writing coding standards, in my view.
On the particular problem that a small change to the XML requires large changes to the XSLT, this is specifically what the rule-based template approach of XSLT is designed to prevent. Beginners in XSLT are often slow to adopt this coding style, and instead use a more "procedural" style (for-each, if, choose, call-template) because it's closer to what they have encountered with other languages. Forums like this one are full of advice from experienced developers to use template rules and apply-templates more extensively, and this is precisely the reason. So a one-line style guide for XSLT would be simply: use template rules as much as you possibly can.
One problem with coding standards is of course that there are conflicting objectives. You will often find people advising against use of "//x", but that's actually a trade-off: //x improves flexibility (resilience to source document change) at the expense of performance (with some XSLT processors), so any such advice reduced to a one-liner can be unhelpful.

What are the advantages of using XSL in Sitecore instead of C#?

While learning Sitecore I have found that the majority of Sitecore sample code on the web is in XSL instead of .NET.
What would be the advantage of choosing XSL over the processes I have become accustomed to as a .NET developer?
Are there processing speed advantages to using XSL?
Is XSL actually easier once you are comfortable with the syntax?
I'll just add my 2 cents too:
I find that there are too many limitations in XSLT that need to be overcome with either external "libraries" or with you developing a method in C# that can be used in XSLT.
So I find using Asp.Net simpler. But then I'm also a lot better with Asp.Net than with XSLT.
But XSLT has some good things:
good when getting fields from the current context item
good with simple content etc.
doesn't force the solution to recycle/rebuild
usually a nice way it fails, ie. the page still works, but the xslt that failed says it fails
When I first started working with Sitecore, my company used quite a bit of XSLT, but we've slowly gone away from that, because of it's limitations and because most people here are more familiar with Asp.Net/C#.
Some folks prefer XSL because of existing team skill set, the availability of XSL talent, or the belief that XSL is easier or cheaper to learn.
In Sitecore, ASP.NET-based sublayouts actually perform much better than XSL renderings. If that's what you are comfortable with, go for it. I've never created an XSL rendering myself.
XSLT is a powerful language; its main advantages over languages like ASP.NET tend to come when you want to reuse and customize logic over a wide variety of different pages or different source document structures with common shared elements and other variable structures. To achieve this it uses a rule-based processing model which some people find quite difficult to get to grips with on first encounter. Learning it is an investment that will pay off over time, but it can be daunting at first.
As for performance, I've never come across a site where it isn't fast enough for the job, and that includes some pretty high-stress services; when people have had performance problems they've usually turned out to be in other parts of the processing pipeline (or simply due to bad coding).
The choice between XSLT and .Net components in Sitecore is largely one of taste and skillset. XSLT in Sitecore does have some drawbacks though - it tends to be outperformed by .NET components for all but the most simple renderings and the places where it might seem most logical to use it, such as replicating content tree structure as a site menu, are actually those that tend to take the biggest performance hit. In the right situations XSLT is an incredibly powerful tool and well worth learning, but I've yet to see a convincing argument for making much use of it in Sitecore. It's also worth noting that some of the standard patterns of XSLT programming aren't the most efficient in Sitecore.
The only real advantage I can think of, would be that XSLT renderings are easier to deploy in isolation. Say, for instance, that you're updating your "News Spots" rendering and you want to deploy this change to test/production right away - it would be a simple case of uploading the .xsl file itself.
Using .NET development (and enduring the Web Application Project model), a deployment of the code base would implicitly deploy any and all changes to the affected assemblies - including whatever work you have in progress.
There are, of course, ways you can manage this. Source code branching/merging and so on - but that's an additional layer of complexity to your solution.
That being said, I use .NET for well over 95% of all my Sitecore development myself :-)
"In summary, a primary goal of software design and coding is conquering complexity. The motivation behind many programming practices is to reduce a program's complexity. Reducing complexity is a key to being an effective programmer." -Steve McConnell (1993)
Let that guide when to use XSLT over C#.

Is there an alternative to HTML Tidy?

I have embedded HTML Tidy in my application to clean incoming HTML. But Tidy has a huge amount of bugs and fixing them directly in the source is my worst nightmare. Tidy source code is an unreadable abomination. Thousand+ line functions, poor variable naming, spaghetti code etc. It's truly horrible.
Worse yet, official development seems to have ceased. In the last 12 months, there have been three write transactions to the official CVS repo. But it's been dead and buried for much longer than that...
So I'm looking for an OSS C or C++ application/library that can do what Tidy can (when it feels like it): fix bad HTML markup and transform it into valid XHTML (this is the part I'm interested in). And I mean all sorts of bad markup.
Is there something like that out there?
EDIT: I need it both for manipulations on the DOM tree by an XML handling tool and for general compliance with the XHTML spec. My app needs to accept HTML from users (which is often invalid in all sorts of ways) and output valid XHTML. It needs to be able to handle even HTML that would normally not display in a browser because the user edited it by hand and didn't check afterwards.
A drop-in replacement for Tidy's error-correcting parser... that doesn't suck. I don't mind bugs if the source is readable and I can fix problems myself, or if there are active developers who provide bugfixes on a timely basis.
Could you tell us what you plan to use this tool for? As in, do you want to fix static web pages, or do you want some sort of filtering step before other manipulations, so that some tool can handle buggy web pages?
Personally, I write my own tool atop Python's BeautifulSoup or lxml whenever I need to --- it's at most a dozen line script and does much of what I want.
There is a new, nice, proper HTML 5 supporting Tidy, so the alternative to old, ugly Tidy would be Tidy (GitHub repository).
Try Pretty Diff. It is a vastly superior beautification algorithm and it does not make any assumptions about your input.
http://prettydiff.com/?m=beautify&html
For something that actually fixes code, your best bet is still HTML Tidy. There are a lot of linters, but not really anything that repairs errors to HTML, other than Tidy.
At first glance, modern OOP programmers might think that the source code is an unreadable abomination, but in the C world, Tidy is pretty sophisticated library that uses a lot of advanced OO concepts and offers a very thoughtful interface that exposes nearly all of its functionality in a pure C API.
A casual developer will be lost, but once immersed, the code is quite beautiful. Granted, naming conventions are a mixed bad, but PR's are welcome!

What's the most commonly used XML library for C++? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 9 months ago.
Improve this question
I saw a few libraries through a quick Google search. What's generally the most commonly used XML implementation for C++?
I'm planning on using XML as a means for program configuration. I liked XML because I'll be making use of its tree-like structure. If you think you have a more suitable solution for this, feel free to mention it. I want something lightweight and simple. Maybe XML is too much?
Edit: Cross-platform would be preferable, but to answer a question, I'm programming this in Linux.
See if TinyXML helps you
TinyXML is a simple, small, C++ XML parser that can be easily integrating into other programs.
There are several out there:
Xercers Big http://xerces.apache.org/xerces-c/
expat small http://expat.sourceforge.net/
I like expat. But that's a totally personal opinion.
I use it because it is small and it was simple to write a C++ wrapper for.
Xerces is like the full blown XML parser with all the knobs a whistles.
But consequently it is slightly more complex to use.
I would recommend not using XML.
I know this is a matter of opinion but XML really clutters the information with a lot of tags. Also, even though it is human-readable, the clutter actually hampers readability (and I say it from experience since we have some 134 XML configuration files at the moment...). Furthermore, it is quite difficult to read because of the mix between attributes and plain-text. You never know which one you are going to need.
I would recommend using JSON, if you want a language that already has well-defined parsers.
For parsing, a simple look at json.org and you have a long list of C++ libraries.
Not quite the question you asked, but there are two major flavors of XML parsers, SAX and DOM.
SAX parsers are event driven parsers. As the parser sees various elements with the XML document (node, properties, etc.), the parser calls some function or method that you have defined.
DOM parsers on the other hand parse the entire XML document and return a tree structure that represents the entire document. Your code can then poke through the structure in any order it sees.
SAX parsers are more memory efficient because they do not need to represent the entire document in memory. DOM parsers are easier to work with because you are not limited to processing the document in a linear fashion.
The XML libraries I've used and are still using are:
http://xmlsoft.org/
xerces / expat
Xalan-C
If you don't need to use XML then I would suggest not doing so.
I would also avoid modelling what you are reading/writing as C++ classes unless you are using a code generator.
I would also look at using a 'schema to code' generating for reading/writing, though make sure that the licence fits what you are doing.
I highly recommend pugixml
"pugixml is a C++ XML processing library, which consists of a DOM-like interface with rich traversal/modification capabilities, an extremely fast XML parser which constructs the DOM tree from an XML file/buffer, and an XPath 1.0 implementation for complex data-driven tree queries. Full Unicode support is also available, with Unicode interface variants and conversions between different Unicode encodings."
I have tested a few XML parsers including a few commercial ones before choosing and using pugixml in a commercial product.
pugixml was not only the fastest (sometimes a few times faster) parser but also had the most mature and friendly API. I highly recommend it. It is very stable product! I have started to use it since version 0.8. Now it is 1.7.
The great bonus in this parser is XPath 1.0 implementation! For any more complex tree queries the XPath is a God sent feature!
DOM-like interface with rich traversal/modification capabilities is extremely useful to tackle a real life "heavy" XML files.
It is small and fast parser. It is good choice for iOS or Android app if you do not mind linking C++ code.
I also tested TinyXML. It was not only slower but it had problems with my XML files.
Benchmarks tell a lot:
http://pugixml.org/benchmark.html