Please, I'm trying to extract "plain text" from "annotated text" (or plain content from complex content).
This is the input XML I have:
<l>string</l>
<l>string<g><b/>string2</g></l>
<l>string<g><b/>string2</b>string3</g></l>
<l>string<b/>string2<b/>string3</l>
and this is the output I need:
<word>string</word>
<word>string1 string2</word>
<word>string1 string2 string3</word>
<word>string1 string2 string3</word>
Essentially: (i) I do not need the element and (ii) replace empty elements by blank spaces
Many thanks!
You could achieve this by making use of the identity transform, but overridding it with your special cases, like so:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="no"/>
<!-- Replace elements under root element with word element -->
<xsl:template match="/*/*">
<word>
<xsl:apply-templates select="node()"/>
</word>
</xsl:template>
<!-- Match, but don't copy, elements -->
<xsl:template match="#*|node()">
<xsl:apply-templates select="#*|node()"/>
</xsl:template>
<!-- Copy out text nodes -->
<xsl:template match="text()">
<xsl:copy/>
</xsl:template>
<!-- Replace empty element by space -->
<xsl:template match="*[not(node())]">
<xsl:text> </xsl:text>
</xsl:template>
</xsl:stylesheet>
When applied on the following XML
<data>
<l>string</l>
<l>string<g><b/>string2</g></l>
<l>string<g><b/>string2<b/>string3</g></l>
<l>string<b/>string2<b/>string3</l>
</data>
The output is as follows:
<word>string</word>
<word>string string2</word>
<word>string string2 string3</word>
<word>string string2 string3</word>
Related
NOTE: I am using xsltproc on OS X Yosemite.
The source content for an XSLT transformation is HTML. Some
text nodes contain line breaks (<br/>). In the transformed
content (an XML file), I wish to convert the line breaks to spaces.
For example, I have:
<div class="location">London<br />Hyde Park<br /></div>
I want to transform this element like so:
<xsl:element name="location">
<xsl:variable name="location" select="div[#class='location']"/>
<xsl:value-of select="$location"/>
</xsl:element>
What happens is the <br /> are simply removed the output:
<location>LondonHyde Park</location>
I do have other templates that are involved:
<xsl:template match="node()|script"/>
<xsl:template match="*">
<xsl:apply-templates/>
</xsl:template>
What XSLT operations are required to transform the <br />'s here
to a single space?
I would use xsl:apply-templates instead of xsl:value-of and add a template to handle <br/>.
You would also need to modify <xsl:template match="node()|script"/> because node() also selects text nodes. You can replace node() with processing-instruction()|comment() if you need to, but they would not be output by default anyway.
Here's a working example:
Input
<div class="location">London<br />Hyde Park<br /></div>
XSLT 1.0
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="script"/>
<xsl:template match="*">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="div[#class='location']">
<location><xsl:apply-templates/></location>
</xsl:template>
<xsl:template match="br">
<xsl:text> </xsl:text>
</xsl:template>
</xsl:stylesheet>
Output
<location>London Hyde Park </location>
If you don't want the trailing space, you could either...
put the xsl:apply-templates in a variable ($var) and use normalize-space() in an xsl:value-of. Like: <xsl:value-of select="normalize-space($var)"/>
update the match for the br element. Like: br[not(position()=last())]
I'm pretty sure the answer to this is no, but since the only alternative is what I deem inelegant code, I thought I'd throw this out and see if I'm missing something while hoping this hasn't been asked.
Given this source XML:
<root>
<p>Hello world</p>
<move elem="content" item="test"/>
<p>Another text node.</p>
<content item="test">I can't <b>figure</b> this out.</content>
</root>
I want this result:
<root>
<block>Hello world</block>
<newContent>I can't <hmmm>figure</hmmm> this out.</newContent>
<block>Another text node.</block>
</root>
An ordinary language description:
Replace <move .../> with the result of processing
the element whose name matches move's #elem attribute and whose #item
matches move's #item attribute (e.g., in this case the content of the element [<content>] is processed so <b> is replaced by <hmm>).
Prevent the element from step 1 from
being written out to the result tree in its original document order
The problem is the input XML document will be considerably more complex and variable. And the stylesheet is a third-party transform that I am extending. The template I'd have to copy in order to use a mode-based solution is pretty significant in size and that seems inelegant to me. I know, for example, this would work:
<xsl:template match="b">
<hmmm>
<xsl:apply-templates/>
</hmmm>
</xsl:template>
<xsl:template match="p">
<block>
<xsl:apply-templates/>
</block>
</xsl:template>
<xsl:template match="move">
<xsl:variable name="elem" select="#elem"/>
<xsl:variable name="item" select="#item"/>
<xsl:apply-templates select="//*[name()=$elem and #item=$item]" mode="copy-and-process"/>
</xsl:template>
<xsl:template match="content"/>
<xsl:template match="content" mode="copy-and-process">
<newContent><xsl:apply-templates/></newContent>
</xsl:template>
What I would like to do is have the <xsl:template> that matches "content" be sensitive to what node pushes to it. So, that I can have an <xsl:template match="content"/> that is only executed (and therefore its matching node and children are suppressed) when the node pushed from is <root> and not <move>. The virtue in this is that if the third-party stylesheet's relevant template is updated, I don't have to worry about updating a copy of the stylesheet that processes the <content> node. I'm pretty sure this isn't possible, but I thought it was worth asking about.
Simply do:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:key name="kMover" match="move" use="concat(#elem,'+',#item)"/>
<xsl:key name="kToMove" match="*" use="concat(name(),'+',#item)"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="move">
<newContent>
<xsl:apply-templates mode="move" select=
"key('kToMove', concat(#elem,'+',#item))/node()"/>
</newContent>
</xsl:template>
<xsl:template match="p">
<block><xsl:apply-templates/></block>
</xsl:template>
<xsl:template match="b" mode="move">
<hmmm><xsl:apply-templates/></hmmm>
</xsl:template>
<xsl:template match="*[key('kMover', concat(name(),'+',#item))]"/>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<root>
<p>Hello world</p>
<move elem="content" item="test"/>
<p>Another text node.</p>
<content item="test">I can't <b>figure</b> this out.</content>
</root>
the wanted, correct result is produced:
<root>
<block>Hello world</block>
<newContent>I can't <hmmm>figure</hmmm> this out.</newContent>
<block>Another text node.</block>
</root>
I have an external setting file which has some nodes holiding attribute values of main xml document. I need to remove certian nodes from mian xml file if the attribute value is there in the setting file.
My setting file looks like this:
setting.xml
<xml>
<removenode titlename="abc" subtitlename="xyz"></removenode>
<removenode titlename="dvd" subtitlename="dvd"></removenode>
</xml>
Main.xml
<xml>
<title titlename="abc">
<subtitle subtitlename="xyz"></subtitle>
</title>
<title titlename="book">
<subtitle subtitlename="book sub title"></subtitle>
</title>
</xml>
Need a script which look for setting.xml file and remove the title element if titlename and subtitlename found in main.xml. The output should be
output.xml
<xml>
<title titlename="book">
<subtitle subtitlename="book sub title"></subtitle>
</title>
</xml>
I tried using document to read setting.xml file but not able to find how to do the match on main.xml file
<xsl:variable name="SuppressionSettings" select="document('Setting.xml')" />
<xsl:variable name="SuppressSetting" select="$SuppressionSettings/xml/removenode" />
.
Any hint how to implement it?
The key is to use an identity/copy pattern and, before each output, check the current (context) node isn't prohibited by the suppression rules nodeset.
<!-- get suppression settings -->
<xsl:variable name='suppression_settings' select="document('http://www.mitya.co.uk/xmlp/settings.xml')/xml/removenode" />
<!-- begin identity/copy -->
<xsl:template match="node()|#*">
<xsl:if test='not($suppression_settings[#titlename = current()/#titlename and #subtitlename = current()/subtitle/#subtitlename])'>
<xsl:copy>
<xsl:apply-templates select='node()|#*' />
</xsl:copy>
</xsl:if>
</xsl:template>
You can run it here (see output source - the 'abc' title node is omitted):
http://www.xmlplayground.com/9oCYKp
This XSLT indicated below works for the given document.
Note that I'm storing the contents of Setting.xml in a variable as you did, however, I'd then use that variable directly in my queries.
An important issue here is that in the match element of a template, variables cannot be used. Therefore, my template matches any <title> elements and then determines in an <xsl:choose> element whether the attributes match any values given in the settings file - if so, the <title> element will be omitted in the output.
As an explanation for why that test attribute in the <xsl:when> does what it should, imagine a comparison of someAttribute = someOtherAttribute not as a restriction that the attribute someAttribute must have the same value as the attribute someOtherAttribute, but rather as the condition that there must be any two attributes someAttribute and someOtherAttribute with the same value.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="SuppressionSettings" select="document('Setting.xml')" />
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="//title">
<xsl:choose>
<xsl:when test="(#titlename = $SuppressionSettings/xml/removenode/#titlename) and (subtitle/#subtitlename = $SuppressionSettings/xml/removenode/#subtitlename)"/>
<xsl:otherwise>
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Here's a more generic answer where the names of the attributes are not hard coded into the XSLT. Like O. R. Mapper pointed out, in XSLT 1.0 you can't use variable references in the match, so I put the document() directly in the predicate. This may not be as efficient as using a variable and then testing the variable.
XSLT 1.0
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[#* = document('setting.xml')/*/removenode/#*]"/>
</xsl:stylesheet>
XML Output (using your 2 xml files with main.xml as the input)
<xml>
<title titlename="book">
<subtitle subtitlename="book sub title"/>
</title>
</xml>
I'm new to XSLT. I have a block code that I don't understand.
In the following block what does '*','*[#class='vcard']' and '*[#class='fn']' mean?
<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html" encoding="utf-8"/> <xsl:template match="/">
<script type="text/javascript">
<xsl:text><![CDATA[function show_hcard(info) {
win2 = window.open("about:blank", "HCARD", "width=300,height=200," + "scrollbars=no menubar=no, status=no, toolbar=no, scrollbars=no");
win2.document.write("<h1>HCARD</h1><hr/><p>" + info + "</p>"); win2.document.close();
}]]></xsl:text>
</script>
<xsl:apply-templates/> </xsl:template>
<xsl:template match="*">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:apply-templates/>
</xsl:copy> </xsl:template>
<xsl:template match="*[#class='vcard']">
<xsl:apply-templates/> </xsl:template>
<xsl:template match="*[#class='fn']">
<u>
<a>
<xsl:attribute name="onMouseDown">
<xsl:text>show_hcard('</xsl:text>
<xsl:value-of select="text()"/>
<xsl:text>')</xsl:text>
</xsl:attribute>
<xsl:value-of select="text()"/>
</a>
</u> </xsl:template> </xsl:stylesheet>
* matches all elements, *[#class='vcard'] pattern matches all elements with class attribute of vcard value. From that you can figure out what *[#class='fn'] may mean ;-)
I'd also suggest that you start here.
Your stylesheet has four template rules. In English these rules are:
(a) starting at the top (match="/"), first output a script element, then process the next level down (xsl:apply-templates) in the input.
(b) the default rule for elements (match="*") is to create a new element in the output with the same name and attributes as the original, and to construct its content by processing the next level down in the input.
(c) the rule for elements with the attribute class="vcard" is to do nothing with this element, other than to process the next level down in the input.
(d) the rule for elements with the attribute class="fn" is to output
<u><a onMouseDown="show_hcard('X')">X</a></u>
where X is the text content of the element being processed.
A more experienced XSLT user would have written the last rule as
<xsl:template match="*[#class='fn']">
<u>
<a onMouseDown="show_hcard('{.}')">
<xsl:value-of select="."/>
</a>
</u>
</xsl:template>
This is a slightly version of other question posted here:
XSLT: change node inner text
Imagine i use XSLT to transform the document:
<a>
<b/>
<c/>
</a>
into this:
<a>
<b/>
<c/>
Hello world
</a>
In this case i can't use neither the
<xsl:strip-space elements="*"/>
element or the [normalize-space() != ''] predicate since there is no text in the place where i need to put new text. Any ideas? Thanks.
Here is what I would do:
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
<!-- identity template to copy everything unless otherwise noted -->
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<!-- match the first text node directly following a <c> element -->
<xsl:template match="text()[preceding-sibling::node()[1][self::c]]">
<!-- ...and change its contents -->
<xsl:text>Hello world</xsl:text>
</xsl:template>
</xsl:stylesheet>
Note that text nodes contain "surrounding" whitespace - in the sample XML in the question the matched text node is whitespace only, which is why the above works. It will stop to work as soon as the input document looks like this:
<a><b/><c/></a>
because here is no text node following <c>. So if this is too brittle for your use case, an alternative would be:
<!-- <c> nodes get a new adjacent text node -->
<xsl:template match="c">
<xsl:copy-of select="." />
<xsl:text>Hello world</xsl:text>
</xsl:template>
<!-- make sure to remove the first text node directly following a <c> node-->
<xsl:template match="text()[preceding-sibling::node()[1][self::c]]" />
In any case, stuff like the above makes clear why intermixing of text nodes and element nodes is best avoided. This is not always possible (see XHTML). But when you have the chance and the XML is supposed to be purely a container for structural data, staying clear of mixed content makes your life easier.
This transformation inserts the desired text (for generality) after the element named a7:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*" name="identity">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="a7">
<xsl:call-template name="identity"/>
<xsl:text>Hello world</xsl:text>
</xsl:template>
</xsl:stylesheet>
when applied on this XML document:
<a>
<a1/>
<a2/>
.....
<a7/>
<a8/>
</a>
the desired result is produced:
<a>
<a1/>
<a2/>
.....
<a7/>Hello world
<a8/>
</a>
Do note:
The use of the identity rule for copying every node of the source XML document.
The overriding of the identity rule by a specific template that carries out the insertion of the new text.
How the identity rule is both applied (on every node) and called by name (for a specific need).
edit: fixed my fail to put proper syntax in.
<xsl:template match='a'>
<xsl:copy-of select="." />
<xsl:text>Hello World</xsl:text>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>