XSLT: keeping whitespaces when copying attributes - xslt

I'm trying to sort Microsoft Visual Studio's vcproj so that a diff would show something meaningful after e.g. deleting a file from a project. Besides the sorting, I want to keep everything intact, including whitespaces. The input looks like
space<File
spacespaceRelativePath="filename"
spacespace>
...
The xslt fragment below can add the spaces around elements, but I can't find out how to deal with those around attributes, so my output looks like
space<File RelativePath="filename">
xslt I use for the msxsl 4.0 processor:
<xsl:for-each select="File">
<xsl:sort select="#RelativePath"/>
<xsl:value-of select="preceding-sibling::text()[1]"/>
<xsl:copy>
<xsl:for-each select="text()|#*">
<xsl:copy/>
</xsl:for-each>

Those spaces are always insignificant in XML, and I believe that there is no option to control this behavior in a general way for any XML/XSLT library.

XSLT works on a tree representation of the input XML. Many of the irrelevant detail of the original XML has been abstracted away in this tree - for example the order of attributes, insignificant whitespace between attributes, or the distinction between " and ' as an attribute delimiter. I can't see any conceivable reason for wanting to write a program that treats these distinctions as significant.

Related

XSLT engine removing whitespace from <xsl:text> element in text output

I have a transformation that outputs a fixed length output, so where data is "missing" I pad the data with spaces.
so using
<xsl:output method="text" encoding="ISO-8859-1"/>
and then using things like
<xsl:text> </xsl:text>
whenever I need to pad for missing data.
This works very nicely in my dev environment (using visual studio).
but when I deploy it to another machine (which also uses MS XSLT engine, but with unknown parameters, allegedly 6.0), some of my whitespace gets "removed" and my data gets misalligned.
If I replace the above with things like
<xsl:text> </xsl:text>
then that doesnt get stripped.
this also works
<xsl:text xml:space="preserve"> </xsl:text>
Is there a way to get the XSLT to 'preserve' standard spaces in the "text" elements globally?
(the transformation is quite big and I don't want to have to go through replacing every space with , plus it worries me that its not doing 'exactly' what I'm asking it to do)
(I've looked at other questions on this, which is where I got the from, but I'd like to turn the XSLT behaviour OFF)

xsl text and normal string

What is the differnce between
<xsl:param name="abc">123</xsl:param>
<xsl:param name="def"><xsl:text>123</xsl:text></xsl:param>
They both work same but is there some difference between the two.
<xsl:text> will allow you to manipulate the text (escape as well as white space):
<xsl:text disable-output-escaping="yes|no">
From http://msdn.microsoft.com/en-us/library/ms256107(v=vs.110).aspx:
In a style sheet, text can be generated to the literal result tree
with or without <xsl:text>. However, with this element you can exert
some control over the white space created by the style sheet. For
example, to make your style sheet more readable, you might want to
write one element per line in a template, and indent some lines. Doing
so introduces white space as part of the template rule. This might or
might not be a desired effect of the transformation. Sometimes you
might want to introduce a white space character to separate two data
values. You can use an <xsl:text> element to accomplish this. White
space enclosed within <xsl:text> is output to the result tree.
The main reason for using <xsl:text> is that whitespace arount text XSLT stylesheets is normally discarded/stripped (http://www.w3.org/TR/xslt#strip). Text enclosed in <xsl:text> is the exception to that rule. So if you want to explicitly output spaces, tabs, or newlines you may need to use <xsl:text>
<xsl:text> also allows case-by-case control over the output escaping mode. That's less frequently used; disabling output escaping is usually the wrong solution unless you're generating non-XML/non-HTML output.
[Apologies for the confusion, now corrected above. I just noticed that I had indeed been looking at a Working Draft of XSLT 1.0, not the final RECcomendation. Mea culpa, mea culpa, mea maxima culpa.]

XSLT Identity Transformation without change to the output

Is it possible to do xslt identity transformation where absolutly nothing is changed from the source?
When I use following template, ident and linebreaks are changed in the output and I don't want to do any changes to the source xml.
XSLT
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
INPUT
<S:Envelope
xmlns:S="http://www.w3.org/2003/05/soap-envelope"
xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing"
xmlns:f123="http://www.fabrikam123.example/svc53">
<S:Header>
<wsa:MessageID>
uuid:aaaabbbb-cccc-dddd-eeee-wwwwwwwwwww
</wsa:MessageID>
<wsa:RelatesTo>
uuid:aaaabbbb-cccc-dddd-eeee-ffffffffffff
</wsa:RelatesTo>
<wsa:To S:mustUnderstand="1">
http://business456.example/client1
</wsa:To>
<wsa:Action>http://fabrikam123.example/mail/DeleteAck</wsa:Action>
</S:Header>
<S:Body>
<f123:DeleteAck/>
</S:Body>
</S:Envelope>
OUTPUT
<?xml version="1.0" encoding="UTF-8"?><S:Envelope xmlns:S="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:f123="http://www.fabrikam123.example/svc53">
<S:Header>
<wsa:MessageID>
uuid:aaaabbbb-cccc-dddd-eeee-wwwwwwwwwww
</wsa:MessageID>
<wsa:RelatesTo>
uuid:aaaabbbb-cccc-dddd-eeee-ffffffffffff
</wsa:RelatesTo>
<wsa:To S:mustUnderstand="1">
http://business456.example/client1
</wsa:To>
<wsa:Action>http://fabrikam123.example/mail/DeleteAck</wsa:Action>
</S:Header>
<S:Body>
<f123:DeleteAck/>
</S:Body>
</S:Envelope>
No, you cannot. The input and output XML will be the "same" in the sense that they produce the same XML Infoset, but they will not necessarily be byte-for-byte identical and this is not something that XSLT can control.
Why do you need this? If you are trying to compare XML documents easily, consider using XML Canonicalization. Many XML libraries have a method of producing canonical XML, and the xmllint command line tool can produce it easily from files.
The default behavior of XSLT processors is to preserve whitespace in the input, and the behavior of the processors I've just tested is consistent with the spec.
But the whitespace in question is whitespace in the text nodes of the input.
The whitespace between attribute-value specifications in start-tags, and the whitespace between items (e.g. comments and processing instructions) in the prolog and epilog of the document are not text nodes, and are not affected by the preserve-space settings. That white space is also, in fact, not part of the XPath data model, so there is very little the processor can legitimately do to preserve it.
If the whitespace in question carries information, you will want to revisit the design of the vocabulary (it's really a bad idea for that whitespace to be significant); if it's just that you would prefer that there be newlines between attribute-value specifications, you may want to write a custom serializer to insert such newlines and indentation on output. (If your motive is to avoid confusing a diff program with whitespace differences, my experience is that your choices are to normalize whitespace before diffing or to get a diff program that's a bit more robust in the face of whitespace variation.) Good luck.
In general it's not possible to be 100% confident that you'll get exactly everything unchanged because the xslt data model simply doesn't preserve all the information from the parse. For example if the input contains < then the output might contain <. Similarly CDATA sections aren't preserved - adjacent text nodes (CDATA sections and normal text modes) are merged into one at parse time and while you can configure the processor to use CDATA for the content of certain elements you can't simply preserve them as they were.
There are other issues such as the fact that the data model doesn't distinguish between <foo></foo>, <foo/> and <foo /> - they all represent the same empty element and any of them from the input could be represented by any of them in the output. And as in your example white space between attributes within a start tag is not preserved.
But of course these differences are all things that an XML tool shouldn't care about as they're different ways to represent exactly the same infoset.

need to display char in xslt

Hi all
I am using xslt 1.0. I have the char code as FOA7 which has to displayed as a corresponding character. My input is
<w:sym w:font="Wingdings" w:char="F0A7"/>
my xslt template is
<xsl:template match="w:sym">
<xsl:variable name="char" select="#w:char"/>
<span font-family="{#w:fonts}">
<xsl:value-of select="concat('&#x',$char,';')"/>
</span>
</xsl:template>
It showing the error as ERROR: 'A decimal representation must immediately follow the "&#" in a character reference.'
Please help me in fixing this..Thanks in advance...
This isn't possible in (reasonable) XSLT. You can work around it.
Your solution with concat is invalid: XSLT is not just a fancy string-concatenator, it really transforms the conceptual tree. An encoded character such as  is a single character - if you were to somehow include the letters & # x f 0 a 7 ; then the XSLT processor would be required to include these letters in the XML data - not the string! So that means it will escape them.
There's no feature in XSLT 1.0 that permits converting from a number to a character with that codepoint.
In XSLT 2.0, as Michael Kay points out, you can use codepoints-to-string() to achieve this.
There are two solutions. Firstly, you could use disable-output-escaping. This is rather nasty and not portable. Avoid this at all costs if you can - but it will probably work in your transformer, and it's probably the only general, simple solution, so you may not be able to avoid this.
The second solution would be to hardcode matches for each individual character. That's a mess generally, but quite possible if you're dealing with a limited set of possibilities - that depends on your specific problem.
Finally, I'd recommend not solving this problem in XSLT - this is typically something you can do in pre/post processing in another programming environment more appropriately. Most likely, you've an in-memory representation of the XML document to be able to use XSLT in the first place, in which case this won't even take much CPU time.
<span font-family="{#w:font}">
<xsl:value-of select="concat('&#x', #w:char, ';')"
disable-output-escaping="yes"/>
</span>
Though check #Eamon Nerbonne's answer, why you shouldn't do it at all.
If you were using XSLT 2.0 (which you aren't), you could write a function to convert hex to decimal, and then use codepoints-to-string() on the result.
use '&' for '&' in output:
<xsl:value-of select="concat('&#x',$char,';')"/>

Automatically converting escaped characters to string literals

I am working on an XSLT transformation to re-arrange XML blocks to validate NewsML files. Some of these files contains encoded characters (such as & " etc...). The problem is the XSLT transformation is converting these characters to their literal string (ie "and", "'"). This is causing problems. I do not want this to happen.
I have experimented with various techniques (uses of <xsl:text>, <xsl:value-of> and the disable-output-escaping flag, <xsl:output method='xml|html|xhtml|text'>) to no avail. These methods either, convert the characters, or simply leave them out.
eg, a string which starts with "stars on PM&apos;s cards" can end up as
stars on PM's cards
stars on PMs cards
I am using the Saxonica (http://www.saxonica.com/) processing app.
The basic XSLT I am using is provided below. (There are other things but the problem exists even with this simplest stylesheet)
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="no" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Any ideas on how to prevent this conversion would be most appreciated. The requirement is to keep the original text as it appears.
I think you need to do both the disable-output-escaping="yes" and set the document to HTML at the same time.
FROM W3C (emphasis mine):
It is an error for output escaping to be disabled for a text node that is used for something other than a text node in the result tree. Thus, it is an error to disable output escaping for an xsl:value-of or xsl:text element that is used to generate the string-value of a comment, processing instruction or attribute node; it is also an error to convert a result tree fragment to a number or a string if the result tree fragment contains a text node for which escaping was disabled. In both cases, an XSLT processor may signal the error; if it does not signal the error, it must recover by ignoring the disable-output-escaping attribute.
The disable-output-escaping attribute may be used with the html output method as well as with the xml output method. The text output method ignores the disable-output-escaping attribute, since it does not perform any output escaping.
An XSLT processor will only be able to disable output escaping if it controls how the result tree is output. This may not always be the case. For example, the result tree may be used as the source tree for another XSLT transformation instead of being output. An XSLT processor is not required to support disabling output escaping. If an xsl:value-of or xsl:text specifies that output escaping should be disabled and the XSLT processor does not support this, the XSLT processor may signal an error; if it does not signal an error, it must recover by not disabling output escaping.
If output escaping is disabled for a character that is not representable in the encoding that the XSLT processor is using for output, then the XSLT processor may signal an error; if it does not signal an error, it must recover by not disabling output escaping.
Since disabling output escaping may not work with all XSLT processors and can result in XML that is not well-formed, it should be used only when there is no alternative.
These are entities. Usually they get mapped to a unicode representation of that entity. The final stream will just contain the characters. If you output the stream it's up to the serializer to escape the characters depending on the output type (which is what you can disable with disable-output-escaping). So a proper serializer should turn this
<xsl:output method="html" encoding="UTF-8"/>
<xsl:text>some test</xsl:text>
into
some test
See section 5 on this article.
So I would check that with your XSLT processor first.