Handling < > in XSLT 1.0 - xslt

I have a problem, when trying to read a structure having < > in source XML.
Input Structure -
<?xml version="1.0" encoding="UTF-8"?>
<RecordsData>
<RecordsData>
<UID><RecordsData xmlns=""><RecordsData><UID>200</UID><RID>Test-1</RID><Date>20142812</Date><Status>N</Status></RecordsData></RecordsData></UID>
</RecordsData>
</RecordsData>
Expected Output Structure (there are two requirements) -
One is just conversion of < >into well formed XML tags.
<?xml version="1.0" encoding="UTF-8"?>
<RecordsData>
<RecordsData>
<UID><RecordsData xmlns=""><RecordsData><UID>200</UID><RID>Test-1</RID><Date>20142812</Date><Status>N</Status></RecordsData></RecordsData></UID>
</RecordsData>
</RecordsData>
Second is extraction of whole data inside UID tag with output as only below -
<RecordsData xmlns=""><RecordsData><UID>200</UID><RID>Test-1</RID><Date>20142812</Date><Status>N</Status></RecordsData></RecordsData>
I am able to get second output if I have first one in hand. But struggling to get first output from Input over last few days after searching forum extensively and being very new to XSLT.
If we can directly get second output from input source - it's actually what is expected solution. For above - I just tried to break down problem into steps.
Any of experts can you please help!
Thanks,

Conversion is easy, extraction is not.
To convert the escaped markup to real markup, simply disable the escaping when writing the node to the result tree, for example:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="UID">
<xsl:copy>
<xsl:value-of select="." disable-output-escaping="yes"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Ideally, you would use the resulting XML file to extract any data from the escaped portion. Otherwise you would have to apply string functions for this purpose, since the escaped text is not XML.
However, in your example, you don't want to extract anything particular from the data, just isolate it and convert it to a stand-alone markup document. This can be easily accomplished by:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:value-of select="RecordsData/RecordsData/UID" disable-output-escaping="yes"/>
</xsl:template>
</xsl:stylesheet>

Related

how to get 'excel' new lines in spreadsheetML and the behaviour of nodeset() on disable-output-escaping (Saxon xslt 1.0)

This is a follow up question to
how to get 'excel' new lines in spreadsheetML (MSXSLT)
but asked as a new question, to separate this into different issue, as the behaviour seems to be different between engines (I'll leave the specific context in the other question, this is purely how to achieve some functional result).
This XSLT (in saxon he) will create what I want.
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<root>
<bar>
<xsl:text disable-output-escaping="yes">&#10;</xsl:text>
</bar>
</root>
</xsl:template>
</xsl:stylesheet>
and gives the output
<root>
<bar>
</bar>
</root>
this one wont:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
version="1.0">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="foo">
<bar>
<xsl:text disable-output-escaping="yes">&#10;</xsl:text>
</bar>
</xsl:variable>
<root>
<xsl:copy-of select="exsl:node-set($foo)"/>
</root>
</xsl:template>
</xsl:stylesheet>
it gives
<bar>&#10;</bar>
(the question is about XSLT 1.0 but interestingly XSLT 3.0 can be made to work like this
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="foo">
<bar>
<xsl:text disable-output-escaping="yes">&#10;</xsl:text>
</bar>
</xsl:variable>
<root>
<xsl:sequence select="$foo"/>
</root>
</xsl:template>
</xsl:stylesheet>
whilst
<xsl:copy-of select="$foo"/>
doesnt. Even following the 'sequence' pattern, I don't seem to be able to preserve non escaping in anything but a non trivial xslt - I've got a complex transformation using call-templates/apply-templates etc, and I think understanding how nodes are interpreted and serialised is not trivial)
There's actually a long history to this question, which was known in the working group as the "sticky d-o-e problem" (d-o-e being disable-output-escaping). The question is, does d-o-e have any effect when writing to a temporary tree (an xsl:variable), or is it only effective when writing to serialized output?
The XSLT 1.0 specification is pretty clear on the matter:
It is an error for output escaping to be disabled for a text node that
is used for something other than a text node in the result tree. Thus,
it is an error to disable output escaping for an xsl:value-of or
xsl:text element that is used to generate the string-value of a
comment, processing instruction or attribute node; it is also an error
to convert a result tree fragment to a number or a string if the
result tree fragment contains a text node for which escaping was
disabled. In both cases, an XSLT processor may signal the error; if it
does not signal the error, it must recover by ignoring the
disable-output-escaping attribute.
XSLT 2.0 deprecated d-o-e, but retained the rule in a slightly different form:
This [property], however, can be set only within a final result tree
that is being passed to the serializer.
But in between those two versions, the working group dithered. The XSLT 1.1 working draft (which never became a recommendation, but was popularised by the first version of my XSLT book) says:
When a root node is copied using an xsl:copy-of element ... and
escaping was disabled for a text node descendant of that root node,
then escaping should also be disabled for the resulting copy of that
text node. For example
<xsl:variable name="x">
<xsl:text disable-output-escaping="yes"><</xsl:text>
</xsl:variable>
<xsl:copy-of select="$x"/>
This is the "sticky d-o-e" - the d-o-e property is attached to the text node in the temporary tree and springs into life when the text node is eventually serialized. So this behaviour was endorsed at some stage in the life of XSLT, and you may be using a processor that implements this version of the spec.
Generally, though, try to forget that d-o-e exists. Whatever the problem, it's not the best solution. It's an incredibly messy feature because it requires a breaking of the architectural boundary between the transformation processor and the serializer, and breaking this boundary leads to close coupling of the transformation and serialization, and prevents you reusing the same code in a different pipeline configuration.
I'm afraid that researching the history of the W3C spec on this is rather easier than researching exactly what was implemented in early versions of Saxon (which are now nearly a quarter of a century old).
So to take the information from Michael Kay's answer which explains how the specification for XSLT 1.0 handles this, then we CAN implement a solution, for this.
So we take a recap of the underlying issue.
Excel spreadsheetML requires data to be formatted with the specific chars "
" to interpret a line feed in a cell (but this solution applies generally).
<Cell>Alpha
Bravo
Charlie</Cell>
If we try to write an XSLT to generate this, lets say naively:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<Cell>
<xsl:text>Alpha</xsl:text>
<xsl:text>&#10;</xsl:text>
<xsl:text>Bravo</xsl:text>
<xsl:text>&#10;</xsl:text>
<xsl:text>Charlie</xsl:text>
</Cell>
</xsl:template>
</xsl:stylesheet>
our
will get delimited and we get this
<Cell>Alpha&#10;Bravo&#10;Charlie</Cell>
this (thanks to the answer on how to get 'excel' new lines in spreadsheetML (MSXSLT)) can be fixed by using
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<Cell>
<xsl:text>Alpha</xsl:text>
<xsl:text disable-output-escaping="yes">&#10;</xsl:text>
<xsl:text>Bravo</xsl:text>
<xsl:text disable-output-escaping="yes">&#10;</xsl:text>
<xsl:text>Charlie</xsl:text>
</Cell>
</xsl:template>
</xsl:stylesheet>
which produces this:
<Cell>Alpha
Bravo
Charlie</Cell>
unfortunately this 'breaks' if you process your output document via some intermediary internal document e.g. even this:
<xsl:stylesheet version="1.0"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
exclude-result-prefixes="msxsl">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="output">
<Cell>
<xsl:text>Alpha</xsl:text>
<xsl:text disable-output-escaping="yes">&#10;</xsl:text>
<xsl:text>Bravo</xsl:text>
<xsl:text disable-output-escaping="yes">&#10;</xsl:text>
<xsl:text>Charlie</xsl:text>
</Cell>
</xsl:variable>
<xsl:copy-of select="msxsl:node-set($output)"/>
</xsl:template>
</xsl:stylesheet>
reverts to:
<Cell>Alpha&#10;Bravo&#10;Charlie</Cell>
because (see Michael Hay's answer) the disable-output-escaping attribute gets ignored if its passed through some internal document (i.e. the variable).
So...how can you get around this?
If you generate a token for the LF, you can then construct your psuedo excel output almost in its entirety except you use a custom element to flag the LF char, and then you can process that DIRECTLY into the result tree and interpret the custom element as an unescaped "
"
so this:
<xsl:stylesheet version="1.0"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:kookerella="kookerella.com"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
exclude-result-prefixes="msxsl kookerella">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="output">
<Cell>
<xsl:text>Alpha</xsl:text>
<kookerella:LF/>
<xsl:text>Bravo</xsl:text>
<kookerella:LF/>
<xsl:text>Charlie</xsl:text>
</Cell>
</xsl:variable>
<!-- process data directly into the result tree only -->
<xsl:apply-templates select="msxsl:node-set($output)" mode="injectLF"/>
</xsl:template>
<!-- Inject LF -->
<xsl:template match="#* | node()" mode="injectLF">
<xsl:copy>
<xsl:apply-templates select="#* | node()" mode="injectLF"/>
</xsl:copy>
</xsl:template>
<xsl:template match="kookerella:LF" mode="injectLF">
<xsl:text disable-output-escaping="yes">&#10;</xsl:text>
<xsl:apply-templates select="#* | node()" mode="injectLF"/>
</xsl:template>
</xsl:stylesheet>
now results in:
<Cell>Alpha
Bravo
Charlie</Cell>
P.S.
as an aside, this seems to work for me in both the various MSXSLT and Saxon HE, but I have had an instance of using the MSXSLT engine where even this doesnt work, presumably due to some configuration out output serialisation issue.

I need to remove a node from an XML based on a Condition using Group by

I have the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<EMPLOYEE_ASSIGNMENT>
<REFRESH_DATE>2022-03-10 10:55:35.000</REFRESH_DATE>
<PERSON_ID>11189</PERSON_ID>
<EMPLOYEE_ID>032656300</EMPLOYEE_ID>
<EFFECTIVE_START_DATE>2020-08-19 00:00:00.000</EFFECTIVE_START_DATE>
<EFFECTIVE_END_DATE>4712-12-31 00:00:00.000</EFFECTIVE_END_DATE>
<BUSINESS_PROCESS>Absence Return for XXXXXXXXX last day of absence on 08/18/2020, first day back at work on 08/19/2020</BUSINESS_PROCESS>
<ACT_ASSIGNMENT_STATUS_TYPE_ID>1</ACT_ASSIGNMENT_STATUS_TYPE_ID>
<ACT_ORGANIZATION_ID>601</ACT_ORGANIZATION_ID>
<ACT_JOB_QUINTIQ_POSITION>Trainee</ACT_JOB_QUINTIQ_POSITION>
<ACT_HOURS_PER_WEEK>37.5</ACT_HOURS_PER_WEEK>
<ACT_HOURS_FREQUENCY>W</ACT_HOURS_FREQUENCY>
<ACT_BARGAINING_UNIT_CODE>C</ACT_BARGAINING_UNIT_CODE>
<ACT_PRIMARY_PROVINCE>BC</ACT_PRIMARY_PROVINCE>
</EMPLOYEE_ASSIGNMENT>
<EMPLOYEE_ASSIGNMENT>
<REFRESH_DATE>2022-03-10 10:55:35.000</REFRESH_DATE>
<PERSON_ID>11189</PERSON_ID>
<EMPLOYEE_ID>032656300</EMPLOYEE_ID>
<EFFECTIVE_START_DATE>2020-08-19 00:00:00.000</EFFECTIVE_START_DATE>
<EFFECTIVE_END_DATE>4712-12-31 00:00:00.000</EFFECTIVE_END_DATE>
<BUSINESS_PROCESS>Data Change: XXXXXXXXXXXX</BUSINESS_PROCESS>
<ACT_ASSIGNMENT_STATUS_TYPE_ID>1</ACT_ASSIGNMENT_STATUS_TYPE_ID>
<ACT_ORGANIZATION_ID>856</ACT_ORGANIZATION_ID>
<ACT_JOB_QUINTIQ_POSITION>Employee</ACT_JOB_QUINTIQ_POSITION>
<ACT_HOURS_PER_WEEK>37.5</ACT_HOURS_PER_WEEK>
<ACT_HOURS_FREQUENCY>W</ACT_HOURS_FREQUENCY>
<ACT_BARGAINING_UNIT_CODE>C</ACT_BARGAINING_UNIT_CODE>
<ACT_PRIMARY_PROVINCE>MB</ACT_PRIMARY_PROVINCE>
</EMPLOYEE_ASSIGNMENT>
</root>
The two nodes have the same EFFECTIVE_START_DATE but different BUSINESS_PROCESS for a single EMPLOYEE_ID. I need to transform that XML in a way that: when Two (or more) BUSINESS_PROCESS are present for and EMPLOYEE_ID on the same EFFECTIVE_START_DATE it shows only the one that is of value Data Change: XXXXXXXXX.
I need to transform it to:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<EMPLOYEE_ASSIGNMENT>
<REFRESH_DATE>2022-03-10 10:55:35.000</REFRESH_DATE>
<PERSON_ID>11189</PERSON_ID>
<EMPLOYEE_ID>032656300</EMPLOYEE_ID>
<EFFECTIVE_START_DATE>2020-08-19 00:00:00.000</EFFECTIVE_START_DATE>
<EFFECTIVE_END_DATE>4712-12-31 00:00:00.000</EFFECTIVE_END_DATE>
<BUSINESS_PROCESS>Data Change: XXXXXXXXXXXX</BUSINESS_PROCESS>
<ACT_ASSIGNMENT_STATUS_TYPE_ID>1</ACT_ASSIGNMENT_STATUS_TYPE_ID>
<ACT_ORGANIZATION_ID>856</ACT_ORGANIZATION_ID>
<ACT_JOB_QUINTIQ_POSITION>Employee</ACT_JOB_QUINTIQ_POSITION>
<ACT_HOURS_PER_WEEK>37.5</ACT_HOURS_PER_WEEK>
<ACT_HOURS_FREQUENCY>W</ACT_HOURS_FREQUENCY>
<ACT_BARGAINING_UNIT_CODE>C</ACT_BARGAINING_UNIT_CODE>
<ACT_PRIMARY_PROVINCE>MB</ACT_PRIMARY_PROVINCE>
</EMPLOYEE_ASSIGNMENT>
</root>
Thanks a lot
This is difficult to follow. If (!) I understand your description correctly, you want to do:
XSLT 2.0
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/root">
<xsl:copy>
<xsl:for-each-group select="EMPLOYEE_ASSIGNMENT" group-by="concat(EMPLOYEE_ID,'|', EFFECTIVE_START_DATE)">
<xsl:copy-of select="current-group()[starts-with(BUSINESS_PROCESS, 'Data Change:')]"/>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

XSLT output empty Debugging fails with no output

I read through forums and i am unable to understand why the output is empty. It might be a simple thing i am missing.
I tried debugging in VS 2017 and it does not give any output any help in this matter is appreciated. IF i input only the ns0:EFACT_D96A_ORDERS_EAN008 node as an input to XSLT the output comes with the "test" content
<?xml version="1.0" encoding="UTF-16"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" xmlns:var="http://schemas.microsoft.com/BizTalk/2003/var" exclude-result-prefixes="msxsl var s0 userCSharp" version="1.0" xmlns:ns1="http://Microsoft.LobServices.Sap/2007/03/Types/Idoc/3/ORDERS05//740" xmlns:ns0="http://Microsoft.LobServices.Sap/2007/03/Idoc/3/ORDERS05//740/Send" xmlns:ns2="http://Microsoft.LobServices.Sap/2007/03/Types/Idoc/Common/" xmlns:s0="http://schemas.microsoft.com/BizTalk/EDI/EDIFACT/2006" xmlns:ns3="http://schemas.microsoft.com/2003/10/Serialization/" xmlns:userCSharp="http://schemas.microsoft.com/BizTalk/2003/userCSharp" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ins0="http://schemas.microsoft.com/BizTalk/EDI/EDIFACT/2006/InterchangeXML">
<xsl:output omit-xml-declaration="yes" method="xml" version="1.0" />
<xsl:template match="/">
<xsl:apply-templates select="/s0:EFACT_D96A_ORDERS_EAN008" />
</xsl:template>
<xsl:template match="/s0:EFACT_D96A_ORDERS_EAN008">
<ns0:Send>
<ns0:idocData>
Test
</ns0:idocData>
</ns0:Send>
</xsl:template>
</xsl:stylesheet>
Input file-
<ins0:EdifactInterchangeXml DelimiterSetSerializedData="39:-1:-1:43:58:63:-1:46:-1" xmlns:ins0="http://schemas.microsoft.com/BizTalk/EDI/EDIFACT/2006/InterchangeXML">
<ns0:UNA xmlns:ns0="http://schemas.microsoft.com/Edi/EdifactServiceSchema">
<UNA1>58</UNA1>
</ns0:UNA>
<ns0:UNB xmlns:ns0="http://schemas.microsoft.com/Edi/EdifactServiceSchema">
<UNB1>
<UNB1.1>ABCD</UNB1.1>
<UNB1.2>5</UNB1.2>
</UNB1>
</ns0:UNB>
<TransactionSetGroup>
<TransactionSet DocType="http://schemas.microsoft.com/BizTalk/EDI/EDIFACT/2006#EFACT_D96A_ORDERS_EAN008">
<ns0:EFACT_D96A_ORDERS_EAN008 xmlns:ns0="http://schemas.microsoft.com/BizTalk/EDI/EDIFACT/2006">
</ns0:EFACT_D96A_ORDERS_EAN008>
</TransactionSet>
</TransactionSetGroup>
<ns0:UNZ xmlns:ns0="http://schemas.microsoft.com/Edi/EdifactServiceSchema">
<UNZ1>1</UNZ1>
<UNZ2>86</UNZ2>
</ns0:UNZ>
</ins0:EdifactInterchangeXml>
A forward slash / at the start of the expression matches the document node, so doing select="/s0:EFACT_D96A_ORDERS_EAN008" will only select s0:EFACT_D96A_ORDERS_EAN008 if it is a child of the document node. i.e. if it is the root element, which it isn't.
To select s0:EFACT_D96A_ORDERS_EAN008 regardless of where it is in the document do this...
<xsl:apply-templates select="//s0:EFACT_D96A_ORDERS_EAN008" />
You also need to remove the single forward slash from the match expression too (You don't need the double-slashes in the match expression, as the match will work regardless of where the element is in the document)
Try this XSLT
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" xmlns:var="http://schemas.microsoft.com/BizTalk/2003/var" exclude-result-prefixes="msxsl var s0 userCSharp" version="1.0" xmlns:ns1="http://Microsoft.LobServices.Sap/2007/03/Types/Idoc/3/ORDERS05//740" xmlns:ns0="http://Microsoft.LobServices.Sap/2007/03/Idoc/3/ORDERS05//740/Send" xmlns:ns2="http://Microsoft.LobServices.Sap/2007/03/Types/Idoc/Common/" xmlns:s0="http://schemas.microsoft.com/BizTalk/EDI/EDIFACT/2006" xmlns:ns3="http://schemas.microsoft.com/2003/10/Serialization/" xmlns:userCSharp="http://schemas.microsoft.com/BizTalk/2003/userCSharp" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ins0="http://schemas.microsoft.com/BizTalk/EDI/EDIFACT/2006/InterchangeXML">
<xsl:output omit-xml-declaration="yes" method="xml" version="1.0" />
<xsl:template match="/">
<xsl:apply-templates select="//s0:EFACT_D96A_ORDERS_EAN008" />
</xsl:template>
<xsl:template match="s0:EFACT_D96A_ORDERS_EAN008">
<ns0:Send>
<ns0:idocData>
Test
</ns0:idocData>
</ns0:Send>
</xsl:template>
</xsl:stylesheet>

XSLT - Get String between commas

How can I get the value 'four' in XSLT?
<root>
<entry>(one,two,three,four,five,six)</entry>
</root>
Thanks in advance.
You didn't specify the XSLT version, so I assume version 2.0.
I also assume that word four is only a "marker", stating from which place
take the result string (between the 3rd and 4th comma).
To get the fragment you want, you can:
Use tokenize function to "cut" the whole content of entry
into pieces, using a comma as the cutting pattern.
Take the fourth element of the result array.
This expression can be used e.g. in a template matching entry.
So the example script can look like below:
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="UTF-8" indent="yes" />
<xsl:template match="entry">
<xsl:copy>
<xsl:value-of select="tokenize(., ',')[4]"/>
</xsl:copy>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy><xsl:apply-templates select="#*|node()"/></xsl:copy>
</xsl:template>
</xsl:transform>
For your input XML it gives:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<entry>four</entry>
</root>

XSLT: How to remove elements of a resulting result tree fragment while copying?

My goal is to extract the contents of the SOAP body, f.e. the ElementsToExtract node - but the node name can basically be arbitrary:
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope">
<soap:Header>
<MessageId>52DF2371-4094-4408-A3EA-42D73FD1B7A3</MessageId>
</soap:Header>
<soap:Body>
<ElementsToExtract>
...
<RemoveMe>...</RemoveMe>
<RemoveMeAlso>...</RemoveMeAlso>
...
</ElementsToExtract>
</soap:Body>
</soap:Envelope>
While I'm extracting the contents, I want to get rid of two elements that all my source documents have in common - say RemoveMe and RemoveMeAlso. As there's a chance that the deeper nested nodes may be called the same, they must only be stripped from the layer below the ElementsToExtract node. How would I formulate that expression?
Here's what I did up to now:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:soap="http://www.w3.org/2003/05/soap-envelope"
xmlns:exsl="http://exslt.org/common"
exclude-result-prefixes="soap exsl">
<xsl:output method="xml" indent="yes" omit-xml-declaration="no"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="SoapHeaderContents" select="exsl:node-set(soap:Envelope/soap:Header/*)"/>
<xsl:variable name="SoapBodyContents" select="exsl:node-set(soap:Envelope/soap:Body/*)"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<xsl:apply-templates select="$SoapBodyContents"/>
</xsl:template>
<!-- This is global, how to restrict to the ElementsToExtract element? -->
<xsl:template match="node()[name() = 'RemoveMe']"/>
<xsl:template match="node()[name() = 'RemoveMeAlso']"/>
</xsl:stylesheet>
I also played with the node-set() function, having read that one can not modify result tree fragments (they're only text nodes?), but I don't quite understand how to address the resulting nodes of that set. So the nodes weren't removed:
<xsl:template match="/">
<xsl:apply-templates select="$SoapBodyContents"/>
<xsl:apply-templates select="$SoapBodyContents/RemoveMe" mode="m1"/>
</xsl:template>
<xsl:template name="StripRemoveMe" match="RemoveMe" mode="m1"/>
I also read some parts of the specification, but to no avail. I'm lost for clues. Can someone direct me to the right approach?
Would this work for you:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:soap="http://www.w3.org/2003/05/soap-envelope"
exclude-result-prefixes="soap">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<!-- skip soap wrappers -->
<xsl:template match="/soap:Envelope">
<xsl:apply-templates select="soap:Body/ElementsToExtract"/>
</xsl:template>
<!-- remove unwanted elements -->
<xsl:template match="ElementsToExtract/RemoveMe | ElementsToExtract/RemoveMeAlso"/>
</xsl:stylesheet>
In the (unlikely) case you don't know the name of the ElementsToExtract element, you could use:
<!-- skip soap wrappers -->
<xsl:template match="/soap:Envelope">
<xsl:apply-templates select="soap:Body/*"/>
</xsl:template>
<!-- remove unwanted elements -->
<xsl:template match="soap:Body/*/RemoveMe | soap:Body/*/RemoveMeAlso"/>
Some quick thoughts.
You create variables for storing the SOAP header and body. These are already in the input document, so it makes more sense to just write templates that match these.
Although you create a variable for the SOAP header, you never use it anywhere.
If you try to apply templates in succession, as in your sample XSL code, you will get all the output nodes from the first apply-templates, and then all the output nodes from the next apply-templates. If these nodes are meant to be interleaved in any way, this approach will not produce viable output.
Here's a revised version of your sample input XML, adding in a couple elements that we want to keep.
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope">
<soap:Header>
<MessageId>52DF2371-4094-4408-A3EA-42D73FD1B7A3</MessageId>
</soap:Header>
<soap:Body>
<ElementsToExtract>
<KeepMe>This text will persist in the output.</KeepMe>
<RemoveMe>This is text that will be removed.</RemoveMe>
<RemoveMeAlso>This will also vanish from the output.</RemoveMeAlso>
<OtherElementToKeep>And this one will also be kept.</OtherElementToKeep>
</ElementsToExtract>
</soap:Body>
</soap:Envelope>
Here's what we'd want as output:
<?xml version="1.0" encoding="utf-8"?>
<ElementsToExtract>
<KeepMe>This text will persist in the output.</KeepMe>
<OtherElementToKeep>And this one will also be kept.</OtherElementToKeep>
</ElementsToExtract>
This XSL 1.0 code will do the job. I'm guessing from your post that you're not familiar with XSL processing flow, so I've added comments to help explain what's going on.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:soap="http://www.w3.org/2003/05/soap-envelope"
version="1.0"
exclude-result-prefixes="soap">
<xsl:strip-space elements="*"/>
<xsl:output method="xml" indent="yes"/>
<!-- The `/` matches the _logical root_ of the input file. This is
basically equivalent to the start of the file, NOT the first element.
This is a common place to start processing in XSL. -->
<xsl:template match="/">
<!-- We just apply templates. In your case, we know already that
we DON'T want to process everything: we want to leave certain
things out, including a lot of the outermost elements. So
we specify what to target in the `select` statement. -->
<xsl:apply-templates select="soap:Envelope/soap:Body/ElementsToExtract"/>
</xsl:template>
<!-- This is the "identity" template, so called because it
just copies over applicable matches identically.
A template with a more-specific match statement takes
precedence. -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<!-- Here, we specify exactly those elements that are in the
processing flow, and that we want to exclude from the
output. Since `soap:Header` etc. are NOT in the processing
flow (their element trees were never included in a preceding
call to `apply-templates`), we don't need to worry about those. -->
<xsl:template match="RemoveMe | RemoveMeAlso"/>
</xsl:stylesheet>
Note that the outermost element in the output is ElementsToExtract. This element will include the xmlns:soap="http://www.w3.org/2003/05/soap-envelope" namespace declaration, even though this namespace isn't used in any of the output elements (at least, for this small sample input XML).
If you can use XSL 2.0+ and you want to remove this namespace from the output, you could add the copy-namespaces="no" attribute to the <xsl:copy> element.