XSLT: How to remove elements of a resulting result tree fragment while copying? - xslt

My goal is to extract the contents of the SOAP body, f.e. the ElementsToExtract node - but the node name can basically be arbitrary:
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope">
<soap:Header>
<MessageId>52DF2371-4094-4408-A3EA-42D73FD1B7A3</MessageId>
</soap:Header>
<soap:Body>
<ElementsToExtract>
...
<RemoveMe>...</RemoveMe>
<RemoveMeAlso>...</RemoveMeAlso>
...
</ElementsToExtract>
</soap:Body>
</soap:Envelope>
While I'm extracting the contents, I want to get rid of two elements that all my source documents have in common - say RemoveMe and RemoveMeAlso. As there's a chance that the deeper nested nodes may be called the same, they must only be stripped from the layer below the ElementsToExtract node. How would I formulate that expression?
Here's what I did up to now:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:soap="http://www.w3.org/2003/05/soap-envelope"
xmlns:exsl="http://exslt.org/common"
exclude-result-prefixes="soap exsl">
<xsl:output method="xml" indent="yes" omit-xml-declaration="no"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="SoapHeaderContents" select="exsl:node-set(soap:Envelope/soap:Header/*)"/>
<xsl:variable name="SoapBodyContents" select="exsl:node-set(soap:Envelope/soap:Body/*)"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<xsl:apply-templates select="$SoapBodyContents"/>
</xsl:template>
<!-- This is global, how to restrict to the ElementsToExtract element? -->
<xsl:template match="node()[name() = 'RemoveMe']"/>
<xsl:template match="node()[name() = 'RemoveMeAlso']"/>
</xsl:stylesheet>
I also played with the node-set() function, having read that one can not modify result tree fragments (they're only text nodes?), but I don't quite understand how to address the resulting nodes of that set. So the nodes weren't removed:
<xsl:template match="/">
<xsl:apply-templates select="$SoapBodyContents"/>
<xsl:apply-templates select="$SoapBodyContents/RemoveMe" mode="m1"/>
</xsl:template>
<xsl:template name="StripRemoveMe" match="RemoveMe" mode="m1"/>
I also read some parts of the specification, but to no avail. I'm lost for clues. Can someone direct me to the right approach?

Would this work for you:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:soap="http://www.w3.org/2003/05/soap-envelope"
exclude-result-prefixes="soap">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<!-- skip soap wrappers -->
<xsl:template match="/soap:Envelope">
<xsl:apply-templates select="soap:Body/ElementsToExtract"/>
</xsl:template>
<!-- remove unwanted elements -->
<xsl:template match="ElementsToExtract/RemoveMe | ElementsToExtract/RemoveMeAlso"/>
</xsl:stylesheet>
In the (unlikely) case you don't know the name of the ElementsToExtract element, you could use:
<!-- skip soap wrappers -->
<xsl:template match="/soap:Envelope">
<xsl:apply-templates select="soap:Body/*"/>
</xsl:template>
<!-- remove unwanted elements -->
<xsl:template match="soap:Body/*/RemoveMe | soap:Body/*/RemoveMeAlso"/>

Some quick thoughts.
You create variables for storing the SOAP header and body. These are already in the input document, so it makes more sense to just write templates that match these.
Although you create a variable for the SOAP header, you never use it anywhere.
If you try to apply templates in succession, as in your sample XSL code, you will get all the output nodes from the first apply-templates, and then all the output nodes from the next apply-templates. If these nodes are meant to be interleaved in any way, this approach will not produce viable output.
Here's a revised version of your sample input XML, adding in a couple elements that we want to keep.
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope">
<soap:Header>
<MessageId>52DF2371-4094-4408-A3EA-42D73FD1B7A3</MessageId>
</soap:Header>
<soap:Body>
<ElementsToExtract>
<KeepMe>This text will persist in the output.</KeepMe>
<RemoveMe>This is text that will be removed.</RemoveMe>
<RemoveMeAlso>This will also vanish from the output.</RemoveMeAlso>
<OtherElementToKeep>And this one will also be kept.</OtherElementToKeep>
</ElementsToExtract>
</soap:Body>
</soap:Envelope>
Here's what we'd want as output:
<?xml version="1.0" encoding="utf-8"?>
<ElementsToExtract>
<KeepMe>This text will persist in the output.</KeepMe>
<OtherElementToKeep>And this one will also be kept.</OtherElementToKeep>
</ElementsToExtract>
This XSL 1.0 code will do the job. I'm guessing from your post that you're not familiar with XSL processing flow, so I've added comments to help explain what's going on.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:soap="http://www.w3.org/2003/05/soap-envelope"
version="1.0"
exclude-result-prefixes="soap">
<xsl:strip-space elements="*"/>
<xsl:output method="xml" indent="yes"/>
<!-- The `/` matches the _logical root_ of the input file. This is
basically equivalent to the start of the file, NOT the first element.
This is a common place to start processing in XSL. -->
<xsl:template match="/">
<!-- We just apply templates. In your case, we know already that
we DON'T want to process everything: we want to leave certain
things out, including a lot of the outermost elements. So
we specify what to target in the `select` statement. -->
<xsl:apply-templates select="soap:Envelope/soap:Body/ElementsToExtract"/>
</xsl:template>
<!-- This is the "identity" template, so called because it
just copies over applicable matches identically.
A template with a more-specific match statement takes
precedence. -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<!-- Here, we specify exactly those elements that are in the
processing flow, and that we want to exclude from the
output. Since `soap:Header` etc. are NOT in the processing
flow (their element trees were never included in a preceding
call to `apply-templates`), we don't need to worry about those. -->
<xsl:template match="RemoveMe | RemoveMeAlso"/>
</xsl:stylesheet>
Note that the outermost element in the output is ElementsToExtract. This element will include the xmlns:soap="http://www.w3.org/2003/05/soap-envelope" namespace declaration, even though this namespace isn't used in any of the output elements (at least, for this small sample input XML).
If you can use XSL 2.0+ and you want to remove this namespace from the output, you could add the copy-namespaces="no" attribute to the <xsl:copy> element.

Related

xsl:apply-templates returns nothing − what am I missing?

I have a simple XML response, like
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<searchRetrieveResponse xmlns="http://www.loc.gov/zing/srw/">
<numberOfRecords>1</numberOfRecords>
<records>
<record>
<recordData>
<kitodo xmlns="http://meta.kitodo.org/v1/">
<metadata name="key1">value1</metadata>
<metadata name="key2">value2</metadata>
<metadata name="key3">value3</metadata>
</kitodo>
</recordData>
</record>
</records>
</searchRetrieveResponse>
which I want to transform to this by XSLT
<?xml version="1.0" encoding="utf-8"?>
<mets:mdWrap xmlns:kitodo="http://meta.kitodo.org/v1/"
xmlns:mets="http://www.loc.gov/METS/"
xmlns:srw="http://www.loc.gov/zing/srw/"
MDTYPE="OTHER"
OTHERMDTYPE="Kitodo">
<mets:xmlData>
<kitodo:kitodo>
<kitodo:metadata name="key1">value1</kitodo:metadata>
<kitodo:metadata name="key2">value2</kitodo:metadata>
<kitodo:metadata name="key3">value3</kitodo:metadata>
</kitodo:kitodo>
</mets:xmlData>
</mets:mdWrap>
That is, I want to remove the outside tree searchRetrieveResponse/records/record/recordData, replace it with mdWrap/xmlData and move the contained data node there.
I have a quite short XSLT for it:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:kitodo="http://meta.kitodo.org/v1/" xmlns:mets="http://www.loc.gov/METS/" xmlns:srw="http://www.loc.gov/zing/srw/">
<xsl:output method="xml" indent="yes" encoding="utf-8"/>
<xsl:strip-space elements="*"/>
<xsl:template match="srw:recordData">
<mets:mdWrap MDTYPE="OTHER" OTHERMDTYPE="Kitodo">
<mets:xmlData>
<xsl:apply-templates select="#*|node()"/>
</mets:xmlData>
</mets:mdWrap>
</xsl:template>
<!-- pass-through rule -->
<xsl:template match="#*|node()">
<xsl:apply-templates select="#*|node()"/>
</xsl:template>
</xsl:stylesheet>
However, what I get is:
<?xml version="1.0" encoding="utf-8"?>
<mets:mdWrap xmlns:kitodo="http://meta.kitodo.org/v1/"
xmlns:mets="http://www.loc.gov/METS/"
xmlns:srw="http://www.loc.gov/zing/srw/"
MDTYPE="OTHER"
OTHERMDTYPE="Kitodo">
<mets:xmlData/>
</mets:mdWrap>
Obviously, the template match="srw:recordData" does match, otherwise I would get an empty result. However, the contained apply-templates doesn’t output anything. (I also tried an <xsl:apply-templates/> without a select="" attribute, but it doesn’t output anything either.) What am I missing?
XSLT processor is net.sf.saxon.TransformerFactoryImpl (Java)
I think nothing happens when you are applying templates inside xmlData. There are no templates that would match descendant nodes.
Try using copy-of:
<xsl:template match="srw:recordData">
<mets:mdWrap MDTYPE="OTHER" OTHERMDTYPE="Kitodo">
<mets:xmlData>
<xsl:copy-of select="kitodo:kitodo"/>
</mets:xmlData>
</mets:mdWrap>
</xsl:template>
The problem is not with the xsl:apply-templates instruction. It is with the template being applied. Your "pass-through rule" does not write anything to the output. You probably meant to have the identity transform template in that place - which goes like this:
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>

XPath 3.0 Serialize without Namespaces in Scope

While answering this question, it occurred to me that I know how to use the XSLT 3.0 (XPath 3.0) serialize() function, but that I do not know how to avoid serialization of namespaces that are in scope. Here is a minimal example:
XML Input
<?xml version="1.0" encoding="UTF-8" ?>
<ci:cichlids xmlns:ci="http://www.cichlids.com">
<cichlid id="1">
<name>Zeus</name>
<color>gold</color>
<teeth>molariform</teeth>
<breeding-type>lekking</breeding-type>
</cichlid>
</ci:cichlids>
XSLT 3.0 Stylesheet
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:output="http://www.w3.org/2010/xslt-xquery-serialization"
xmlns:ci="http://www.cichlids.com">
<xsl:output method="xml" encoding="UTF-8" indent="yes" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/ci:cichlids/cichlid">
<xsl:variable name="serial-params">
<output:serialization-parameters>
<output:omit-xml-declaration value="yes"/>
</output:serialization-parameters>
</xsl:variable>
<xsl:value-of select="serialize(., $serial-params/*)"/>
</xsl:template>
</xsl:stylesheet>
Actual Output
<?xml version="1.0" encoding="UTF-8"?>
<ci:cichlids xmlns:ci="http://www.cichlids.com">
<cichlid xmlns:ci="http://www.cichlids.com" id="1">
<name>Zeus</name>
<color>gold</color>
<teeth>molariform</teeth>
<breeding-type>lekking</breeding-type>
</cichlid>
</ci:cichlids>
The serialization process included the namespace declaration that is in scope for the cichlid element, although it is not used on this element. I would like to remove this declaration and make the output look like
Expected Output
<?xml version="1.0" encoding="UTF-8"?>
<ci:cichlids xmlns:ci="http://www.cichlids.com">
<cichlid id="1">
<name>Zeus</name>
<color>gold</color>
<teeth>molariform</teeth>
<breeding-type>lekking</breeding-type>
</cichlid>
</ci:cichlids>
I know how to modify the cichlid element, removing the namespaces in scope, and serialize this modified element instead. But this seems a rather cumbersome solution. My question is:
What is a canonical way to serialize an XML element using the serialize() function without also serializing unused namespace declarations that are in scope?
Testing with Saxon-EE 9.6.0.7 from within Oxygen.
Serialization will always give you a faithful representation of the data model that you are serializing. If you want to modify the data model, that's called transformation. Run a transformation to remove the unwanted namespaces, then serialize the result.
Michael Kay already gave the correct answer and I have accepted it. This is just to flesh out his comments. By
Run a transformation to remove the unwanted namespaces, then serialize the result.
he means applying a transformation like the following before calling serialize():
XSLT Stylesheet
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:output="http://www.w3.org/2010/xslt-xquery-serialization"
xmlns:ci="http://www.cichlids.com">
<xsl:output method="xml" encoding="UTF-8" indent="yes" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:variable name="cichlid-without-namespace">
<xsl:copy-of copy-namespaces="no" select="/ci:cichlids/cichlid"/>
</xsl:variable>
<xsl:template match="/ci:cichlids/cichlid">
<xsl:variable name="serial-params">
<output:serialization-parameters>
<output:omit-xml-declaration value="yes"/>
</output:serialization-parameters>
</xsl:variable>
<xsl:value-of select="serialize($cichlid-without-namespace, $serial-params/*)"/>
</xsl:template>
</xsl:stylesheet>
XML Output
<?xml version="1.0" encoding="UTF-8"?>
<ci:cichlids xmlns:ci="http://www.cichlids.com">
<cichlid id="1">
<name>Zeus</name>
<color>gold</color>
<teeth>molariform</teeth>
<breeding-type>lekking</breeding-type>
</cichlid>
</ci:cichlids>

XSLT code to remove xml tag

I have an incoming XML like below: I need to remove the <shoeboxImage> tag from the incoming below XML.
Incoming XML Input:
<attachReceipt>
<baseMessage>
<returnCode>200</returnCode>
</baseMessage>
<payload>
<returnCode>0</returnCode>
<shoeboxItem>
<shoeboxImageCount>2</shoeboxImageCount>
<shoeboxImages>
<shoeboxImage>
<name>receiptImage.jpg</name>
</shoeboxImage>
<shoeboxImage>
<name>receiptImage.jpg</name>
</shoeboxImage>
</shoeboxImages>
</shoeboxItem>
</payload>
</attachReceipt>
Expected Output:
<attachReceipt>
<baseMessage>
<returnCode>200</returnCode>
</baseMessage>
<payload>
<returnCode>0</returnCode>
<shoeboxItem>
<shoeboxImageCount>2</shoeboxImageCount>
<shoeboxImages>
<name>receiptImage.jpg</name>
<name>receiptImage.jpg</name>
</shoeboxImages>
</shoeboxItem>
</payload>
</attachReceipt>
Need some xslt code snippet to do this.
I don't have the necessary software installed to actually test this, but this should work:
<xsl:template match="shoeboxImage">
<xsl:apply-templates select="*|text()"/>
</xsl:template>
The idea is that when a shoeboxImage element is encountered, it generates nothing for the element itself, and just continues with its children.
You need to have an identity template and a template that will remove the element shoeboxImage but will retain its descendants.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:strip-space elements="*"/>
<xsl:output indent="yes"/>
<!-- identity template -->
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<!-- template override for the element shoeboxImage -->
<xsl:template match="shoeboxImage">
<xsl:apply-templates/>
</xsl:template>
</xsl:stylesheet>

XSLT: How to discard unwanted HTML nodes from source?

I am using XSLT 1.0, and using xsltproc on OS X Yosemite.
The source content is HTML; the target content is XML.
The issue is a fairly common one. I want all "uninteresting"
nodes simply to be discarded from the output. I've seen catch-all
directives like this:
<xsl:template match="node()|script"/>
<xsl:template match="*">
<xsl:apply-templates/>
</xsl:template>
This is close to what I need. But unfortunately, it's too strong when I need to add another template that visits one of the text nodes caught by node(). For example, suppose I added this template:
<xsl:template match="a/div[#class='location']/br">
<xsl:text> </xsl:text>
</xsl:template>
which simply replaces certain <br/> elements with spaces.
Well, node() precludes this latter template from taking effect,
because the relevant text node containing the line-break is discarded
already!
Well, to correct the issue, here's what I have done in lieu of the catch-all node():
<xsl:template match="html/head|div[#id='banner_parent']|button|ul|div[#id='feed_title']|span|div[#class='submit_event']|script"/>
But this is precisely the problem: I am now piecing together a template
whose matching criteria is likely to be error-prone when the source
content changes.
Is there a simpler directive that would accomplish the same thing? I'm aiming for something like this:
<xsl:template match="node()[not(locations)]|script"/>
Thanks.
If i understood correctly, you want only some nodes in the output and the rest you dont care abour, in this example I try to catch only li elements and throw the rest away.. not sure if this is what you want though http://xsltransform.net/gWmuiKk
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="html" doctype-public="XSLT-compat" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />
<!-- Lets pretend li is interesting for you -->
<xsl:template match="li">
<xsl:text>Interesting Node Only!
</xsl:text>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:apply-templates select="#*|node()"/>
</xsl:template>
</xsl:transform>

Handling < > in XSLT 1.0

I have a problem, when trying to read a structure having < > in source XML.
Input Structure -
<?xml version="1.0" encoding="UTF-8"?>
<RecordsData>
<RecordsData>
<UID><RecordsData xmlns=""><RecordsData><UID>200</UID><RID>Test-1</RID><Date>20142812</Date><Status>N</Status></RecordsData></RecordsData></UID>
</RecordsData>
</RecordsData>
Expected Output Structure (there are two requirements) -
One is just conversion of < >into well formed XML tags.
<?xml version="1.0" encoding="UTF-8"?>
<RecordsData>
<RecordsData>
<UID><RecordsData xmlns=""><RecordsData><UID>200</UID><RID>Test-1</RID><Date>20142812</Date><Status>N</Status></RecordsData></RecordsData></UID>
</RecordsData>
</RecordsData>
Second is extraction of whole data inside UID tag with output as only below -
<RecordsData xmlns=""><RecordsData><UID>200</UID><RID>Test-1</RID><Date>20142812</Date><Status>N</Status></RecordsData></RecordsData>
I am able to get second output if I have first one in hand. But struggling to get first output from Input over last few days after searching forum extensively and being very new to XSLT.
If we can directly get second output from input source - it's actually what is expected solution. For above - I just tried to break down problem into steps.
Any of experts can you please help!
Thanks,
Conversion is easy, extraction is not.
To convert the escaped markup to real markup, simply disable the escaping when writing the node to the result tree, for example:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="UID">
<xsl:copy>
<xsl:value-of select="." disable-output-escaping="yes"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Ideally, you would use the resulting XML file to extract any data from the escaped portion. Otherwise you would have to apply string functions for this purpose, since the escaped text is not XML.
However, in your example, you don't want to extract anything particular from the data, just isolate it and convert it to a stand-alone markup document. This can be easily accomplished by:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:value-of select="RecordsData/RecordsData/UID" disable-output-escaping="yes"/>
</xsl:template>
</xsl:stylesheet>