how to get 'excel' new lines in spreadsheetML and the behaviour of nodeset() on disable-output-escaping (Saxon xslt 1.0) - xslt

This is a follow up question to
how to get 'excel' new lines in spreadsheetML (MSXSLT)
but asked as a new question, to separate this into different issue, as the behaviour seems to be different between engines (I'll leave the specific context in the other question, this is purely how to achieve some functional result).
This XSLT (in saxon he) will create what I want.
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<root>
<bar>
<xsl:text disable-output-escaping="yes">&#10;</xsl:text>
</bar>
</root>
</xsl:template>
</xsl:stylesheet>
and gives the output
<root>
<bar>
</bar>
</root>
this one wont:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
version="1.0">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="foo">
<bar>
<xsl:text disable-output-escaping="yes">&#10;</xsl:text>
</bar>
</xsl:variable>
<root>
<xsl:copy-of select="exsl:node-set($foo)"/>
</root>
</xsl:template>
</xsl:stylesheet>
it gives
<bar>&#10;</bar>
(the question is about XSLT 1.0 but interestingly XSLT 3.0 can be made to work like this
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="foo">
<bar>
<xsl:text disable-output-escaping="yes">&#10;</xsl:text>
</bar>
</xsl:variable>
<root>
<xsl:sequence select="$foo"/>
</root>
</xsl:template>
</xsl:stylesheet>
whilst
<xsl:copy-of select="$foo"/>
doesnt. Even following the 'sequence' pattern, I don't seem to be able to preserve non escaping in anything but a non trivial xslt - I've got a complex transformation using call-templates/apply-templates etc, and I think understanding how nodes are interpreted and serialised is not trivial)

There's actually a long history to this question, which was known in the working group as the "sticky d-o-e problem" (d-o-e being disable-output-escaping). The question is, does d-o-e have any effect when writing to a temporary tree (an xsl:variable), or is it only effective when writing to serialized output?
The XSLT 1.0 specification is pretty clear on the matter:
It is an error for output escaping to be disabled for a text node that
is used for something other than a text node in the result tree. Thus,
it is an error to disable output escaping for an xsl:value-of or
xsl:text element that is used to generate the string-value of a
comment, processing instruction or attribute node; it is also an error
to convert a result tree fragment to a number or a string if the
result tree fragment contains a text node for which escaping was
disabled. In both cases, an XSLT processor may signal the error; if it
does not signal the error, it must recover by ignoring the
disable-output-escaping attribute.
XSLT 2.0 deprecated d-o-e, but retained the rule in a slightly different form:
This [property], however, can be set only within a final result tree
that is being passed to the serializer.
But in between those two versions, the working group dithered. The XSLT 1.1 working draft (which never became a recommendation, but was popularised by the first version of my XSLT book) says:
When a root node is copied using an xsl:copy-of element ... and
escaping was disabled for a text node descendant of that root node,
then escaping should also be disabled for the resulting copy of that
text node. For example
<xsl:variable name="x">
<xsl:text disable-output-escaping="yes"><</xsl:text>
</xsl:variable>
<xsl:copy-of select="$x"/>
This is the "sticky d-o-e" - the d-o-e property is attached to the text node in the temporary tree and springs into life when the text node is eventually serialized. So this behaviour was endorsed at some stage in the life of XSLT, and you may be using a processor that implements this version of the spec.
Generally, though, try to forget that d-o-e exists. Whatever the problem, it's not the best solution. It's an incredibly messy feature because it requires a breaking of the architectural boundary between the transformation processor and the serializer, and breaking this boundary leads to close coupling of the transformation and serialization, and prevents you reusing the same code in a different pipeline configuration.
I'm afraid that researching the history of the W3C spec on this is rather easier than researching exactly what was implemented in early versions of Saxon (which are now nearly a quarter of a century old).

So to take the information from Michael Kay's answer which explains how the specification for XSLT 1.0 handles this, then we CAN implement a solution, for this.
So we take a recap of the underlying issue.
Excel spreadsheetML requires data to be formatted with the specific chars "
" to interpret a line feed in a cell (but this solution applies generally).
<Cell>Alpha
Bravo
Charlie</Cell>
If we try to write an XSLT to generate this, lets say naively:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<Cell>
<xsl:text>Alpha</xsl:text>
<xsl:text>&#10;</xsl:text>
<xsl:text>Bravo</xsl:text>
<xsl:text>&#10;</xsl:text>
<xsl:text>Charlie</xsl:text>
</Cell>
</xsl:template>
</xsl:stylesheet>
our
will get delimited and we get this
<Cell>Alpha&#10;Bravo&#10;Charlie</Cell>
this (thanks to the answer on how to get 'excel' new lines in spreadsheetML (MSXSLT)) can be fixed by using
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<Cell>
<xsl:text>Alpha</xsl:text>
<xsl:text disable-output-escaping="yes">&#10;</xsl:text>
<xsl:text>Bravo</xsl:text>
<xsl:text disable-output-escaping="yes">&#10;</xsl:text>
<xsl:text>Charlie</xsl:text>
</Cell>
</xsl:template>
</xsl:stylesheet>
which produces this:
<Cell>Alpha
Bravo
Charlie</Cell>
unfortunately this 'breaks' if you process your output document via some intermediary internal document e.g. even this:
<xsl:stylesheet version="1.0"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
exclude-result-prefixes="msxsl">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="output">
<Cell>
<xsl:text>Alpha</xsl:text>
<xsl:text disable-output-escaping="yes">&#10;</xsl:text>
<xsl:text>Bravo</xsl:text>
<xsl:text disable-output-escaping="yes">&#10;</xsl:text>
<xsl:text>Charlie</xsl:text>
</Cell>
</xsl:variable>
<xsl:copy-of select="msxsl:node-set($output)"/>
</xsl:template>
</xsl:stylesheet>
reverts to:
<Cell>Alpha&#10;Bravo&#10;Charlie</Cell>
because (see Michael Hay's answer) the disable-output-escaping attribute gets ignored if its passed through some internal document (i.e. the variable).
So...how can you get around this?
If you generate a token for the LF, you can then construct your psuedo excel output almost in its entirety except you use a custom element to flag the LF char, and then you can process that DIRECTLY into the result tree and interpret the custom element as an unescaped "
"
so this:
<xsl:stylesheet version="1.0"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:kookerella="kookerella.com"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
exclude-result-prefixes="msxsl kookerella">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="output">
<Cell>
<xsl:text>Alpha</xsl:text>
<kookerella:LF/>
<xsl:text>Bravo</xsl:text>
<kookerella:LF/>
<xsl:text>Charlie</xsl:text>
</Cell>
</xsl:variable>
<!-- process data directly into the result tree only -->
<xsl:apply-templates select="msxsl:node-set($output)" mode="injectLF"/>
</xsl:template>
<!-- Inject LF -->
<xsl:template match="#* | node()" mode="injectLF">
<xsl:copy>
<xsl:apply-templates select="#* | node()" mode="injectLF"/>
</xsl:copy>
</xsl:template>
<xsl:template match="kookerella:LF" mode="injectLF">
<xsl:text disable-output-escaping="yes">&#10;</xsl:text>
<xsl:apply-templates select="#* | node()" mode="injectLF"/>
</xsl:template>
</xsl:stylesheet>
now results in:
<Cell>Alpha
Bravo
Charlie</Cell>
P.S.
as an aside, this seems to work for me in both the various MSXSLT and Saxon HE, but I have had an instance of using the MSXSLT engine where even this doesnt work, presumably due to some configuration out output serialisation issue.

Related

XSLT: How to remove elements of a resulting result tree fragment while copying?

My goal is to extract the contents of the SOAP body, f.e. the ElementsToExtract node - but the node name can basically be arbitrary:
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope">
<soap:Header>
<MessageId>52DF2371-4094-4408-A3EA-42D73FD1B7A3</MessageId>
</soap:Header>
<soap:Body>
<ElementsToExtract>
...
<RemoveMe>...</RemoveMe>
<RemoveMeAlso>...</RemoveMeAlso>
...
</ElementsToExtract>
</soap:Body>
</soap:Envelope>
While I'm extracting the contents, I want to get rid of two elements that all my source documents have in common - say RemoveMe and RemoveMeAlso. As there's a chance that the deeper nested nodes may be called the same, they must only be stripped from the layer below the ElementsToExtract node. How would I formulate that expression?
Here's what I did up to now:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:soap="http://www.w3.org/2003/05/soap-envelope"
xmlns:exsl="http://exslt.org/common"
exclude-result-prefixes="soap exsl">
<xsl:output method="xml" indent="yes" omit-xml-declaration="no"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="SoapHeaderContents" select="exsl:node-set(soap:Envelope/soap:Header/*)"/>
<xsl:variable name="SoapBodyContents" select="exsl:node-set(soap:Envelope/soap:Body/*)"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<xsl:apply-templates select="$SoapBodyContents"/>
</xsl:template>
<!-- This is global, how to restrict to the ElementsToExtract element? -->
<xsl:template match="node()[name() = 'RemoveMe']"/>
<xsl:template match="node()[name() = 'RemoveMeAlso']"/>
</xsl:stylesheet>
I also played with the node-set() function, having read that one can not modify result tree fragments (they're only text nodes?), but I don't quite understand how to address the resulting nodes of that set. So the nodes weren't removed:
<xsl:template match="/">
<xsl:apply-templates select="$SoapBodyContents"/>
<xsl:apply-templates select="$SoapBodyContents/RemoveMe" mode="m1"/>
</xsl:template>
<xsl:template name="StripRemoveMe" match="RemoveMe" mode="m1"/>
I also read some parts of the specification, but to no avail. I'm lost for clues. Can someone direct me to the right approach?
Would this work for you:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:soap="http://www.w3.org/2003/05/soap-envelope"
exclude-result-prefixes="soap">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<!-- skip soap wrappers -->
<xsl:template match="/soap:Envelope">
<xsl:apply-templates select="soap:Body/ElementsToExtract"/>
</xsl:template>
<!-- remove unwanted elements -->
<xsl:template match="ElementsToExtract/RemoveMe | ElementsToExtract/RemoveMeAlso"/>
</xsl:stylesheet>
In the (unlikely) case you don't know the name of the ElementsToExtract element, you could use:
<!-- skip soap wrappers -->
<xsl:template match="/soap:Envelope">
<xsl:apply-templates select="soap:Body/*"/>
</xsl:template>
<!-- remove unwanted elements -->
<xsl:template match="soap:Body/*/RemoveMe | soap:Body/*/RemoveMeAlso"/>
Some quick thoughts.
You create variables for storing the SOAP header and body. These are already in the input document, so it makes more sense to just write templates that match these.
Although you create a variable for the SOAP header, you never use it anywhere.
If you try to apply templates in succession, as in your sample XSL code, you will get all the output nodes from the first apply-templates, and then all the output nodes from the next apply-templates. If these nodes are meant to be interleaved in any way, this approach will not produce viable output.
Here's a revised version of your sample input XML, adding in a couple elements that we want to keep.
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope">
<soap:Header>
<MessageId>52DF2371-4094-4408-A3EA-42D73FD1B7A3</MessageId>
</soap:Header>
<soap:Body>
<ElementsToExtract>
<KeepMe>This text will persist in the output.</KeepMe>
<RemoveMe>This is text that will be removed.</RemoveMe>
<RemoveMeAlso>This will also vanish from the output.</RemoveMeAlso>
<OtherElementToKeep>And this one will also be kept.</OtherElementToKeep>
</ElementsToExtract>
</soap:Body>
</soap:Envelope>
Here's what we'd want as output:
<?xml version="1.0" encoding="utf-8"?>
<ElementsToExtract>
<KeepMe>This text will persist in the output.</KeepMe>
<OtherElementToKeep>And this one will also be kept.</OtherElementToKeep>
</ElementsToExtract>
This XSL 1.0 code will do the job. I'm guessing from your post that you're not familiar with XSL processing flow, so I've added comments to help explain what's going on.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:soap="http://www.w3.org/2003/05/soap-envelope"
version="1.0"
exclude-result-prefixes="soap">
<xsl:strip-space elements="*"/>
<xsl:output method="xml" indent="yes"/>
<!-- The `/` matches the _logical root_ of the input file. This is
basically equivalent to the start of the file, NOT the first element.
This is a common place to start processing in XSL. -->
<xsl:template match="/">
<!-- We just apply templates. In your case, we know already that
we DON'T want to process everything: we want to leave certain
things out, including a lot of the outermost elements. So
we specify what to target in the `select` statement. -->
<xsl:apply-templates select="soap:Envelope/soap:Body/ElementsToExtract"/>
</xsl:template>
<!-- This is the "identity" template, so called because it
just copies over applicable matches identically.
A template with a more-specific match statement takes
precedence. -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<!-- Here, we specify exactly those elements that are in the
processing flow, and that we want to exclude from the
output. Since `soap:Header` etc. are NOT in the processing
flow (their element trees were never included in a preceding
call to `apply-templates`), we don't need to worry about those. -->
<xsl:template match="RemoveMe | RemoveMeAlso"/>
</xsl:stylesheet>
Note that the outermost element in the output is ElementsToExtract. This element will include the xmlns:soap="http://www.w3.org/2003/05/soap-envelope" namespace declaration, even though this namespace isn't used in any of the output elements (at least, for this small sample input XML).
If you can use XSL 2.0+ and you want to remove this namespace from the output, you could add the copy-namespaces="no" attribute to the <xsl:copy> element.

XPath 3.0 Serialize without Namespaces in Scope

While answering this question, it occurred to me that I know how to use the XSLT 3.0 (XPath 3.0) serialize() function, but that I do not know how to avoid serialization of namespaces that are in scope. Here is a minimal example:
XML Input
<?xml version="1.0" encoding="UTF-8" ?>
<ci:cichlids xmlns:ci="http://www.cichlids.com">
<cichlid id="1">
<name>Zeus</name>
<color>gold</color>
<teeth>molariform</teeth>
<breeding-type>lekking</breeding-type>
</cichlid>
</ci:cichlids>
XSLT 3.0 Stylesheet
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:output="http://www.w3.org/2010/xslt-xquery-serialization"
xmlns:ci="http://www.cichlids.com">
<xsl:output method="xml" encoding="UTF-8" indent="yes" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/ci:cichlids/cichlid">
<xsl:variable name="serial-params">
<output:serialization-parameters>
<output:omit-xml-declaration value="yes"/>
</output:serialization-parameters>
</xsl:variable>
<xsl:value-of select="serialize(., $serial-params/*)"/>
</xsl:template>
</xsl:stylesheet>
Actual Output
<?xml version="1.0" encoding="UTF-8"?>
<ci:cichlids xmlns:ci="http://www.cichlids.com">
<cichlid xmlns:ci="http://www.cichlids.com" id="1">
<name>Zeus</name>
<color>gold</color>
<teeth>molariform</teeth>
<breeding-type>lekking</breeding-type>
</cichlid>
</ci:cichlids>
The serialization process included the namespace declaration that is in scope for the cichlid element, although it is not used on this element. I would like to remove this declaration and make the output look like
Expected Output
<?xml version="1.0" encoding="UTF-8"?>
<ci:cichlids xmlns:ci="http://www.cichlids.com">
<cichlid id="1">
<name>Zeus</name>
<color>gold</color>
<teeth>molariform</teeth>
<breeding-type>lekking</breeding-type>
</cichlid>
</ci:cichlids>
I know how to modify the cichlid element, removing the namespaces in scope, and serialize this modified element instead. But this seems a rather cumbersome solution. My question is:
What is a canonical way to serialize an XML element using the serialize() function without also serializing unused namespace declarations that are in scope?
Testing with Saxon-EE 9.6.0.7 from within Oxygen.
Serialization will always give you a faithful representation of the data model that you are serializing. If you want to modify the data model, that's called transformation. Run a transformation to remove the unwanted namespaces, then serialize the result.
Michael Kay already gave the correct answer and I have accepted it. This is just to flesh out his comments. By
Run a transformation to remove the unwanted namespaces, then serialize the result.
he means applying a transformation like the following before calling serialize():
XSLT Stylesheet
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:output="http://www.w3.org/2010/xslt-xquery-serialization"
xmlns:ci="http://www.cichlids.com">
<xsl:output method="xml" encoding="UTF-8" indent="yes" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:variable name="cichlid-without-namespace">
<xsl:copy-of copy-namespaces="no" select="/ci:cichlids/cichlid"/>
</xsl:variable>
<xsl:template match="/ci:cichlids/cichlid">
<xsl:variable name="serial-params">
<output:serialization-parameters>
<output:omit-xml-declaration value="yes"/>
</output:serialization-parameters>
</xsl:variable>
<xsl:value-of select="serialize($cichlid-without-namespace, $serial-params/*)"/>
</xsl:template>
</xsl:stylesheet>
XML Output
<?xml version="1.0" encoding="UTF-8"?>
<ci:cichlids xmlns:ci="http://www.cichlids.com">
<cichlid id="1">
<name>Zeus</name>
<color>gold</color>
<teeth>molariform</teeth>
<breeding-type>lekking</breeding-type>
</cichlid>
</ci:cichlids>

Handling < > in XSLT 1.0

I have a problem, when trying to read a structure having < > in source XML.
Input Structure -
<?xml version="1.0" encoding="UTF-8"?>
<RecordsData>
<RecordsData>
<UID><RecordsData xmlns=""><RecordsData><UID>200</UID><RID>Test-1</RID><Date>20142812</Date><Status>N</Status></RecordsData></RecordsData></UID>
</RecordsData>
</RecordsData>
Expected Output Structure (there are two requirements) -
One is just conversion of < >into well formed XML tags.
<?xml version="1.0" encoding="UTF-8"?>
<RecordsData>
<RecordsData>
<UID><RecordsData xmlns=""><RecordsData><UID>200</UID><RID>Test-1</RID><Date>20142812</Date><Status>N</Status></RecordsData></RecordsData></UID>
</RecordsData>
</RecordsData>
Second is extraction of whole data inside UID tag with output as only below -
<RecordsData xmlns=""><RecordsData><UID>200</UID><RID>Test-1</RID><Date>20142812</Date><Status>N</Status></RecordsData></RecordsData>
I am able to get second output if I have first one in hand. But struggling to get first output from Input over last few days after searching forum extensively and being very new to XSLT.
If we can directly get second output from input source - it's actually what is expected solution. For above - I just tried to break down problem into steps.
Any of experts can you please help!
Thanks,
Conversion is easy, extraction is not.
To convert the escaped markup to real markup, simply disable the escaping when writing the node to the result tree, for example:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="UID">
<xsl:copy>
<xsl:value-of select="." disable-output-escaping="yes"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Ideally, you would use the resulting XML file to extract any data from the escaped portion. Otherwise you would have to apply string functions for this purpose, since the escaped text is not XML.
However, in your example, you don't want to extract anything particular from the data, just isolate it and convert it to a stand-alone markup document. This can be easily accomplished by:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:value-of select="RecordsData/RecordsData/UID" disable-output-escaping="yes"/>
</xsl:template>
</xsl:stylesheet>

Coying an entire xml in a Variable using xslt

How can i copy an entire xml as is in an Variable?
Below is the sample xml:
<?xml version="1.0" encoding="UTF-8"?>
<products author="Jesper">
<product id="p1">
<name>Delta</name>
<price>800</price>
<stock>4</stock>
</product>
</products>
I have tried below xslt but it is not working.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" version="1.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="#*|node()">
<xsl:variable name="reqMsg">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:variable>
<xsl:copy-of select="$reqMsg"/>
</xsl:template>
</xsl:stylesheet>
Regards,
Rahul
Your transformation fails because at a certain point, it tries to create a variable (result tree fragment) containing an attribute node. This is not allowed.
It's not really clear what you mean by "copying an entire XML to a variable". But you probably want to simply use the select attribute on the root node:
<xsl:variable name="reqMsg" select="/"/>
This will actually create variable with a node-set containing the root node of the document. Using this variable with xsl:copy-of will output the whole document.
<xsl:copy-of select="document('path/to/file.xml')" />
Or if you need it more than once, to avoid repeating the doc name:
<xsl:variable name="filepath" select="'path/to/file.xml'" />
…
<xsl:copy-of select="document($filepath)" />
The result of document() should be cached IIRC, so don't worry about calling it repeatedly.

Removing empty tags within a variable

My question is related to to another poster's StackOverflow question on Two Phase Processing. I didn't want to use mode="#all" without fully understanding it and how it could affect the rest of my XSLT. I'm thinking the below code accomplishes the same thing without risking interference with other templates but would like confirmation. It kind of seems like I am processing $completepolicy twice without need to do so.
Empty tag definition: <field/> <field></field>. Tags can have attributes but there will never be an empty tag that has an attribute. There will also never be nodes with <field> </field> where the white space could represent many other things.
Given this XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<!-- many other apply-templates here -->
<xsl:variable name="completepolicy" as="element()">
<holder>
<TABLE1 type="global">
<col1>Red</col1>
<col2/>
</TABLE1>
<TABLE2>
<field1>Blue</field1>
<field2/>
</TABLE2>
</holder>
</xsl:variable>
<xsl:apply-templates mode="emptytags" select="$completepolicy/*"/>
</xsl:template>
<xsl:template match="*[not(node())]" mode="emptytags"/>
<xsl:template match="node() | #*" mode="emptytags">
<xsl:copy>
<xsl:apply-templates select="node() | #*" mode="#current"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Results in this output for $completepolicy:
<TABLE1 type="global">
<col1>Red</col1>
</TABLE1>
<TABLE2>
<field1>Blue</field1>
</TABLE2>
Why do you think the $completepolicy variable is being processed twice? This cannot be seen in the provided code.
I confirm that the provided code looks good to me.
I would recommend never to use mode="#all". This is too powerful and dangerous -- this is almost never needed.