XSLT Selection of Nodes Based on Substring of Element Name - xslt

How can I, with XSLT, select nodes based on a substring of the nodes' element name?
For example, consider the XML:
<foo_bar>Keep this.
<foo_who>Keep this, too.
<fu_bar>Don't want this.</fu_bar>
</foo_who>
</foo_bar>
From which I want to output:
<foo_bar>Keep this.
<foo_who>Keep this, too.
</foo_who>
</foo_bar>
Here I want to select for processing those nodes whose names match a regex like "foo.*".
I think I need an XSLT template match attribute expression, or an apply-templates select attribute expression, that applies the regex to the element's name. But maybe this can't be done without some construct like an statement?
Any help would be appreciated.

Here is some XSL that finds elements that start with "foo" to get you started. I don't think regex functionality was added until XSLT 2.0 based on Regular Expression Matching in XSLT 2.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="*">
<xsl:variable name="name" select="local-name()"/>
<xsl:if test="starts-with($name, 'foo')">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
It gives this output, which seems to have an extra newline.
<foo_bar>Keep this.
<foo_who>Keep this, too.
</foo_who>
</foo_bar>

Related

How can I combine xslt and regex to find specific strings

I am very new to XSL, and learning regex, so I might be going about this incorrectly, but I would like a way to find strings in XML files, and sometimes those strings must appear in specific elements, or not in specific elements.
e.g., (\w+)\ (\,|\.|\:|\;|\?) finds orphan punctuation but I don't want to search inside <screen> or similar elements, which typically contain commands, output, and so on, and where orphan punctuation is commonplace.
By way of example:
This is an error , because there is a space before the comma and before the period .
This is not an error, because <command>cd ../</command> is a valid command.
Thanks very much.
To use regular expressions with XSLT you need XSLT 2.0 or later, and it's then very simple:
<!-- Match errors -->
<xsl:template match="text()[matches(., '\s[.,:;?!]')]"
mode="look-for-bad-punctuation" priority="5">
<bad-punctuation-found/>
</xsl:template>
<!-- Match unchecked elements -->
<xsl:template match="screen/text() | command/text()"
mode="look-for-bad-punctuation" priority="6">
<xsl:copy-of select="."/>
</xsl:template>
<!-- Match elements with no error -->
<xsl:template match="text()"
mode="look-for-bad-punctuation" priority="4">
<xsl:copy-of select="."/>
</xsl:template>

What is the meaning of '.?select=*.xml;recurse=yes' in xslt string?

I've seen Michael Kay kindly respond to XSL questions with a template something like shown below.
I'm wondering where the syntax for the string being passed to collection() is documented? I tried searching the XSL spec for some sort of wildcard pattern with recursion but came up empty.
<xsl:template name="main">
<xsl:for-each select="collection('.?select=*.xml;recurse=yes')">
<xsl:result-document href="out/{tokenize(document-uri(.), '/')[last()]">
<xsl:apply-templates select="."/>
</xsl:result-document>
</xsl:for-each>
</xsl:template>
Per Martin's link, it appears this special behavior is native to StandardCollectionURIResolver in Saxon which interprets ?select=*.xml;recurse=yes as a query string where select provides file globbing and recurse provides automatic directory recursion.

Some correct XPath expressions don't work as expected

I'm doing some XML transform on a file generated by WiX after harvesting registry data. For those who are unfamiliar with WiX, just consider I'm trying to do XML transfer on a XML file, no matter where it is coming from. The issue I'm experiencing is: when I use XPath like
match="node()[name() = 'File'][not(#KeyPath)]"
then the matching works fine and it finds all those File nodes that are missing the KeyPath attribute in it, however, if I use another expression of XPath like
match="//File[not(#KeyPath)]"
then it doesn't find any match.
In general I cannot use the standard XPaths with /, //, ., .., however, a piece below with an XPath example in it works well
<xsl:template match="node()[name() = 'File']
[contains(#Source, 'First.dll') or
contains(#Source, 'Second.dll')]
[not(#Assembly)]">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
<xsl:attribute name="Assembly">.net</xsl:attribute>
<xsl:attribute name="KeyPath">yes</xsl:attribute>
</xsl:copy>
</xsl:template>
but something like /bookstore/book[#lang='en'] would not work. Perhaps, I'm missing some descriptions at the beginning of my XSL file to enable recognition of Xpaths like this.
This is because the nodes in the XML file are almost certainly in a namespace
<Wix xmlns="http://schemas.microsoft.com/wix/2006/wi">
Your expression //File[not(#KeyPath)] is looking for a File element that is is no namespace. You need to account for the namespace in your XSLT
So, bind a prefix to it on the xsl:stylesheet like so...
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:wix="http://schemas.microsoft.com/wix/2006/wi" />
Then your match expression becomes this..
<xsl:template match="//wix:File[not(#KeyPath)]" />
In fact, the // is not necessary here in a match. This will work too
<xsl:template match="wix:File[not(#KeyPath)]" />

How to select these elements with Xpath?

I have a document, something like this:
<root>
<A node="1"/>
<B node="2"/>
<A node="3"/>
<A node="4"/>
<B node="5"/>
<B node="6"/>
<A node="7"/>
<A node="8"/>
<B node="9"/>
</root>
Using xpath, How can I select all B elements that consecutively follow a given A element?
It's something like following-silbing::B, except I want them to be only the immediately following elements.
If I am on A (node==1), then I want to select node 2.
If I am on A (node==3), then I want to select nothing.
If I am on A (node==4), then I want to select 5 and 6.
Can I do this in xpath? EDIT: It is within an XSL stylesheet select statement.
EDIT2: I don't want to use the node attribute on the various elements as a unique identifier. I included the node attribute only for purposes of illustrating my point. In the actual XML doc, I don't have an attribute that I use as a unique identifer. The xpath "following-sibling::UL[preceding-sibling::LI[1]/#node = current()/#node]"
keys on the node attribute, and that's not what I want.
Short answer (assuming current() is ok, since this is tagged xslt):
following-sibling::B[preceding-sibling::A[1]/#node = current()/#node]
Example stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml"/>
<xsl:template match="/">
<xsl:apply-templates select="/root/A"/>
</xsl:template>
<xsl:template match="A">
<div>A: <xsl:value-of select="#node"/></div>
<xsl:apply-templates select="following-sibling::B[preceding-sibling::A[1]/#node = current()/#node]"/>
</xsl:template>
<xsl:template match="B">
<div>B: <xsl:value-of select="#node"/></div>
</xsl:template>
</xsl:stylesheet>
Good luck!
While #Chris Nielsen's answer is the right approach, it leaves an uncertainty in cases where the compared attribute is not unique. The more correct way of solving this is:
following-sibling::B[
generate-id(preceding-sibling::A[1]) = generate-id(current())
]
This makes sure that the preceding-sibling::A is identical to the current A, instead of just comparing some attribute values. Unless you have attributes that are guaranteed to be unique, this is the only safe way.
A solution might be to first gather up all the following nodes using following-sibling::*, grab the first of these and require it to be a 'B' node.
following-sibling::*[position()=1][name()='B']

XSLT to Select Desired Elements When Nested In Not-Desired Elements

What XSLT would I use to extract some nodes to output, ignoring others, when the nodes to be be extracted are some times nested nodes to be ignored?
Consider:
<alpha_top>This prints.
<beta>This doesn't.
<alpha_bottom>This too prints.</alpha_bottom>
</beta>
</alpha_top>
I want a transform that produces:
<alpha_top>This prints.
<alpha_bottom>This too prints.</alpha_bottom>
</alpha_top>
This answer shows how to select nodes based on the presence of a string in the element tag name.
Ok, here is a better way
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="beta">
<xsl:apply-templates select="*"></xsl:apply-templates>
</xsl:template>
<xsl:template match="/|*|text()">
<xsl:copy>
<xsl:apply-templates select="*|text()"></xsl:apply-templates>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
This basically does an identity transform, but for the element you don't want to include I removed the xsl:copy and only applied templates on the child elements.
The following stylesheet works on your particular case, but I suspect you are looking for something a bit more generic. I'm also sure there is a simpler way.
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:apply-templates select="alpha_top"></xsl:apply-templates>
</xsl:template>
<xsl:template match="alpha_top">
<xsl:copy>
<xsl:apply-templates select="beta/alpha_bottom|text()"></xsl:apply-templates>
</xsl:copy>
</xsl:template>
<xsl:template match="*|text()">
<xsl:copy>
<xsl:apply-templates select="*|text()"></xsl:apply-templates>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
I think, that once you have a reasonable understand of how XSLT traversal works (hopefully I answered that in your other question) this becomes quite simple.
You have several choices on how to do this. Darrell Miller's answer shows you have to process a whole document and strip out the elements you're not interested in. That's one approach.
Before I go further, I get the impression that you might not entirely 'get' the concept of context in XSLT. This is important and will make your life simpler. At any time in XSLT there is one and only context node. This is the node (element, attribute, comment, etc) currently being 'processed'. Inside a template called via xsl:select the node that has been selected is the context node. So, given your xml:
<alpha_top>This prints.
<beta>This doesn't.
<alpha_bottom>This too prints.</alpha_bottom>
</beta>
</alpha_top>
and the following:
<xsl:apply-templates select='beta'/>
and
<xsl:template match='beta'>...</xsl:template>
the beta node will be the context node inside the template. There's a bit more to it than that but not much.
So, when you start your stylesheet with something like:
<xsl:template match='/'>
<xsl:apply-templates select='alpha_top'/>
</xsl:apply-templates>
you are selecting the children of the document node (the only child element is the alpha_top element). Your xpath statement inside there is relative to the context node.
Now, in that top level template you might decide that you only want to process your alpha_bottom nodes. Then you could put in a statement like:
<xsl:template match='/>
<xsl:apply-templates select='//alpha_top'/>
</xsl:template>
This would walk down the tree and select all alpha_top elements and nothing else.
Alternatively you could process all your elements and simply ignore the content of the beta node:
<xsl:template match='beta'>
<xsl:apply-templates/>
</xsl:template>
(as I mentioned in my other reply to you xsl:apply-templates with no select attribute is the same as using select=''*).
This will ignore the content of the beta node but process all of it's children (assuming you have templates).
So, ignoring elements in your output is basically a matter of using the correct xpath statements in your select attributes. Of course, you might want a good xpath tutorial :)
The probably simplest solution to your problem is this:
<xsl:template match="alpha_top|alpha_bottom">
<xsl:copy>
<xsl:value-of select="text()" />
<xsl:apply-templates />
</xsl:copy>
</xs:template>
<xsl:template match="text()" />
This does not exhibit the same white-space behavior you have in your example, but this is probably irrelevant.