I want to select all texts (including node names) under a node - xslt

I currently have a xml file like this:
<aaa>
<b>I am a <i>boy</i></b>.
</aaa>
How can I get the exact string as: <b>I am a <i>boy</i></b>.? Thanks.

You have to tell XSLT that you want to copy elements through as well. That can be done with an additional rule. Note that I use custom select clauses on my apply-templates elements to select attributes as well as all node-type objects. Also note that the rule for aaa takes precedence, and does not copy the aaa element itself to the output.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="aaa">
<xsl:apply-templates select="#*|node()"/>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

<aaa>
<b>I am a <i>boy</i></b>.
</aaa>
How can I get the exact string as:
<b>I am a <i>boy</i></b>.?
The easiest/shortest way to do this in your case is to output the result of the following XPath expression:
/*/node()
This means: "Select all nodes that are children of the top element."
Of course, there are some white-space-only text nodes that we don't want selected, but XSLT can take care of this, so the XPath expression is just as simple as shown above.
Now, to get the result with an XSLT transformation, we use the following:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select="/*/node()"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document, the wanted result is produced:
<b>I am a <i>boy</i></b>.
Do note:
The use of the <xsl:copy-of> xslt instruction (not <xsl:value-of>), which copies nodes, not string values.
The use of the <xsl:strip-space elements="*"/> XSLT instruction, directing the XSLT processor to ignore any white-space-only text node in the XML document.

Related

What's the cleanest way to produce a copy of an existing XML file excluding elements that match a couple different XPath expressions? [duplicate]

I want a simple XSLT which will only keep the elements which contain a certain regex:
<example>
<abc>text</abc>
<bc>text</bc>
<ab>text</ab>
</example>
I want the same XML output but only with the elements which contain an "a":
<example>
<abc>text</abc>
<ab>text</ab>
</example>
Start with the identity transform and add a template that suppresses elements whose name does not match your regex:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[not(matches(name(), 'a'))]"/>
</xsl:stylesheet>
Explanation: By default the identity transformation will copy everything over to the output as-is. Override this default behavior by writing a simple template that matches elements without "a" in their name and does nothing, thereby preventing such elements from appearing in the output document.

XSLT: How to remove elements of a resulting result tree fragment while copying?

My goal is to extract the contents of the SOAP body, f.e. the ElementsToExtract node - but the node name can basically be arbitrary:
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope">
<soap:Header>
<MessageId>52DF2371-4094-4408-A3EA-42D73FD1B7A3</MessageId>
</soap:Header>
<soap:Body>
<ElementsToExtract>
...
<RemoveMe>...</RemoveMe>
<RemoveMeAlso>...</RemoveMeAlso>
...
</ElementsToExtract>
</soap:Body>
</soap:Envelope>
While I'm extracting the contents, I want to get rid of two elements that all my source documents have in common - say RemoveMe and RemoveMeAlso. As there's a chance that the deeper nested nodes may be called the same, they must only be stripped from the layer below the ElementsToExtract node. How would I formulate that expression?
Here's what I did up to now:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:soap="http://www.w3.org/2003/05/soap-envelope"
xmlns:exsl="http://exslt.org/common"
exclude-result-prefixes="soap exsl">
<xsl:output method="xml" indent="yes" omit-xml-declaration="no"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="SoapHeaderContents" select="exsl:node-set(soap:Envelope/soap:Header/*)"/>
<xsl:variable name="SoapBodyContents" select="exsl:node-set(soap:Envelope/soap:Body/*)"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<xsl:apply-templates select="$SoapBodyContents"/>
</xsl:template>
<!-- This is global, how to restrict to the ElementsToExtract element? -->
<xsl:template match="node()[name() = 'RemoveMe']"/>
<xsl:template match="node()[name() = 'RemoveMeAlso']"/>
</xsl:stylesheet>
I also played with the node-set() function, having read that one can not modify result tree fragments (they're only text nodes?), but I don't quite understand how to address the resulting nodes of that set. So the nodes weren't removed:
<xsl:template match="/">
<xsl:apply-templates select="$SoapBodyContents"/>
<xsl:apply-templates select="$SoapBodyContents/RemoveMe" mode="m1"/>
</xsl:template>
<xsl:template name="StripRemoveMe" match="RemoveMe" mode="m1"/>
I also read some parts of the specification, but to no avail. I'm lost for clues. Can someone direct me to the right approach?
Would this work for you:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:soap="http://www.w3.org/2003/05/soap-envelope"
exclude-result-prefixes="soap">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<!-- skip soap wrappers -->
<xsl:template match="/soap:Envelope">
<xsl:apply-templates select="soap:Body/ElementsToExtract"/>
</xsl:template>
<!-- remove unwanted elements -->
<xsl:template match="ElementsToExtract/RemoveMe | ElementsToExtract/RemoveMeAlso"/>
</xsl:stylesheet>
In the (unlikely) case you don't know the name of the ElementsToExtract element, you could use:
<!-- skip soap wrappers -->
<xsl:template match="/soap:Envelope">
<xsl:apply-templates select="soap:Body/*"/>
</xsl:template>
<!-- remove unwanted elements -->
<xsl:template match="soap:Body/*/RemoveMe | soap:Body/*/RemoveMeAlso"/>
Some quick thoughts.
You create variables for storing the SOAP header and body. These are already in the input document, so it makes more sense to just write templates that match these.
Although you create a variable for the SOAP header, you never use it anywhere.
If you try to apply templates in succession, as in your sample XSL code, you will get all the output nodes from the first apply-templates, and then all the output nodes from the next apply-templates. If these nodes are meant to be interleaved in any way, this approach will not produce viable output.
Here's a revised version of your sample input XML, adding in a couple elements that we want to keep.
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope">
<soap:Header>
<MessageId>52DF2371-4094-4408-A3EA-42D73FD1B7A3</MessageId>
</soap:Header>
<soap:Body>
<ElementsToExtract>
<KeepMe>This text will persist in the output.</KeepMe>
<RemoveMe>This is text that will be removed.</RemoveMe>
<RemoveMeAlso>This will also vanish from the output.</RemoveMeAlso>
<OtherElementToKeep>And this one will also be kept.</OtherElementToKeep>
</ElementsToExtract>
</soap:Body>
</soap:Envelope>
Here's what we'd want as output:
<?xml version="1.0" encoding="utf-8"?>
<ElementsToExtract>
<KeepMe>This text will persist in the output.</KeepMe>
<OtherElementToKeep>And this one will also be kept.</OtherElementToKeep>
</ElementsToExtract>
This XSL 1.0 code will do the job. I'm guessing from your post that you're not familiar with XSL processing flow, so I've added comments to help explain what's going on.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:soap="http://www.w3.org/2003/05/soap-envelope"
version="1.0"
exclude-result-prefixes="soap">
<xsl:strip-space elements="*"/>
<xsl:output method="xml" indent="yes"/>
<!-- The `/` matches the _logical root_ of the input file. This is
basically equivalent to the start of the file, NOT the first element.
This is a common place to start processing in XSL. -->
<xsl:template match="/">
<!-- We just apply templates. In your case, we know already that
we DON'T want to process everything: we want to leave certain
things out, including a lot of the outermost elements. So
we specify what to target in the `select` statement. -->
<xsl:apply-templates select="soap:Envelope/soap:Body/ElementsToExtract"/>
</xsl:template>
<!-- This is the "identity" template, so called because it
just copies over applicable matches identically.
A template with a more-specific match statement takes
precedence. -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<!-- Here, we specify exactly those elements that are in the
processing flow, and that we want to exclude from the
output. Since `soap:Header` etc. are NOT in the processing
flow (their element trees were never included in a preceding
call to `apply-templates`), we don't need to worry about those. -->
<xsl:template match="RemoveMe | RemoveMeAlso"/>
</xsl:stylesheet>
Note that the outermost element in the output is ElementsToExtract. This element will include the xmlns:soap="http://www.w3.org/2003/05/soap-envelope" namespace declaration, even though this namespace isn't used in any of the output elements (at least, for this small sample input XML).
If you can use XSL 2.0+ and you want to remove this namespace from the output, you could add the copy-namespaces="no" attribute to the <xsl:copy> element.

XSLT/XPath: Treat <tag> as ordinary string, not node

I have an XML:
<foo>
<bar>some text</bar>
</foo>
And I'm generating an HTML from it using XSTL, and looking for an Xpath (or some XSTL method, don't know) that gives me the whole content of foo. To illustrate my problem,
<xsl:value-of select="foo"/>
as expected, delivers only
some text
. But is there anything I can do to get
<bar>some text</bar>
? So to treat it as if the bar tags would be just ordinary strings.
Well, there's certainly no way of treating <tag> as a string, because XSLT sees the output of the XML parser, which is a tree of nodes: the tags have long since disappeared by the time XSLT swings into action.
But you can of course copy the element node as a whole, rather than just extracting its string value. Simply use <xsl:copy-of> in place of <xsl:value-of>.
You can use Copy
HTML
<foo>
<bar>some text</bar>
</foo>
XSL
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="bar">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Output
<bar>some text</bar>

XSLT: How to discard unwanted HTML nodes from source?

I am using XSLT 1.0, and using xsltproc on OS X Yosemite.
The source content is HTML; the target content is XML.
The issue is a fairly common one. I want all "uninteresting"
nodes simply to be discarded from the output. I've seen catch-all
directives like this:
<xsl:template match="node()|script"/>
<xsl:template match="*">
<xsl:apply-templates/>
</xsl:template>
This is close to what I need. But unfortunately, it's too strong when I need to add another template that visits one of the text nodes caught by node(). For example, suppose I added this template:
<xsl:template match="a/div[#class='location']/br">
<xsl:text> </xsl:text>
</xsl:template>
which simply replaces certain <br/> elements with spaces.
Well, node() precludes this latter template from taking effect,
because the relevant text node containing the line-break is discarded
already!
Well, to correct the issue, here's what I have done in lieu of the catch-all node():
<xsl:template match="html/head|div[#id='banner_parent']|button|ul|div[#id='feed_title']|span|div[#class='submit_event']|script"/>
But this is precisely the problem: I am now piecing together a template
whose matching criteria is likely to be error-prone when the source
content changes.
Is there a simpler directive that would accomplish the same thing? I'm aiming for something like this:
<xsl:template match="node()[not(locations)]|script"/>
Thanks.
If i understood correctly, you want only some nodes in the output and the rest you dont care abour, in this example I try to catch only li elements and throw the rest away.. not sure if this is what you want though http://xsltransform.net/gWmuiKk
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="html" doctype-public="XSLT-compat" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />
<!-- Lets pretend li is interesting for you -->
<xsl:template match="li">
<xsl:text>Interesting Node Only!
</xsl:text>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:apply-templates select="#*|node()"/>
</xsl:template>
</xsl:transform>

XSLT: remove all but the first occurrence of a given node

I have XML something like this:
<MyXml>
<RandomNode1>
<TheNode>
<a/>
<b/>
<c/>
</TheNode>
</RandomeNode1>
<RandomNode2>
</RandomNode2>
<RandomNode3>
<RandomNode4>
<TheNode>
<a/>
<b/>
<c/>
</TheNode>
</RandomNode4>
</RandomNode3>
</MyXml>
Where <TheNode> appears throughout the XML but not at the same level, often deep within other nodes. What I need to do is eliminate all occurrences of <TheNode> EXCEPT the first. The rest are redundant and taking up space. What would be the XSL that could do this?
I have something like this:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="node() | #*">
<xsl:copy>
<xsl:apply-templates select="node() | #*" />
</xsl:copy>
</xsl:template>
<xsl:template match="//TheNode[position()!=1]">
</xsl:template>
</xsl:stylesheet>
But that is not correct. Any suggestions?
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="node() | #*">
<xsl:copy>
<xsl:apply-templates select="node() | #*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="TheNode[preceding::TheNode]"/>
</xsl:stylesheet>
//TheNode[position()!=1] does not work because here, position() is relative to the parent context of each <TheNode>. It would select all <TheNode>s which are not first within their respective parent.
But you were on the right track. What you meant was:
(//TheNode)[position()!=1]
Note the parentheses - they cause the predicate to be applied to the entire selected node-set, instead of to each node individually.
Unfortunately, even though this is valid XPath expression, it is not valid as a match pattern. A match pattern must be meaningful (applicable) to an individual node, it cannot be a select expression.
So #Alohci's solution,
//TheNode[preceding::TheNode]
is the correct way to express what you want.
Other approach for the pattern would be:
<xsl:template match="TheNode[generate-id()
!= generate-id(/descendant::TheNode[1)]"/>
Note: It's more likely that an absolute expression gets optimizated inteads of a relative expression like preceding::TheNode