One pesky blank line in an XSLT transform - xslt

I'm using XSLT to extract some data from a trademark XML file from the Patent and Trademark Office. It's mostly okay, except for one blank line. I can get rid of it with a moderately ugly workaround, but I'd like to know if there a better way.
Here's a subset of my XSLT:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:tm="http://www.wipo.int/standards/XMLSchema/trademarks" xmlns:pto="urn:us:gov:doc:uspto:trademark:status">
<xsl:output method="text" encoding="utf-8" />
<xsl:strip-space elements="*"/>
<xsl:template match="tm:Transaction">
<xsl:apply-templates select=".//tm:TradeMark"/>
<xsl:apply-templates select=".//tm:ApplicantDetails"/>
<xsl:apply-templates select=".//tm:MarkEvent"/>
</xsl:template>
<xsl:template match="tm:TradeMark">
MarkCurrentStatusDate,"<xsl:value-of select="normalize-space(tm:MarkCurrentStatusDate)"/>"<xsl:text/>
ApplicationNumber,"<xsl:value-of select="normalize-space(tm:ApplicationNumber)"/>"<xsl:text/>
ApplicationDate,"<xsl:value-of select="normalize-space(tm:ApplicationDate)"/>"<xsl:text/>
RegistrationNumber,"<xsl:value-of select="normalize-space(tm:RegistrationNumber)"/>"<xsl:text/>
RegistrationDate,"<xsl:value-of select="normalize-space(tm:RegistrationDate)"/>"<xsl:text/>
<xsl:apply-templates select="tm:WordMarkSpecification"/>
<xsl:apply-templates select="tm:TradeMarkExt"/>
<xsl:apply-templates select="tm:PublicationDetails"/>
<xsl:apply-templates select="tm:RepresentativeDetails"/>
</xsl:template>
<xsl:template match="tm:WordMarkSpecification">
MarkVerbalElementText,"<xsl:value-of select="normalize-space(tm:MarkVerbalElementText)"/>"<xsl:text/>
</xsl:template>
It has a few more templates, but that's the gist of it. I always get a blank line at the very beginning of the output, before any data; I don't get any other blank lines. My circumvention is to combine the two lines:
<xsl:template match="tm:TradeMark">
MarkCurrentStatusDate,"<xsl:value-of select="normalize-space(tm:MarkCurrentStatusDate)"/>"<xsl:text/>
into a single line:
<xsl:template match="tm:TradeMark">MarkCurrentStatusDate,"<xsl:value-of select="normalize-space(tm:MarkCurrentStatusDate)"/>"<xsl:text/>
This works, and I guess I'm okay with it if there's nothing better, but it seems inelegant and like a kludge to me. None of the other templates need this treatment (e.g. the tm:WordMarkSpecification template or another six after that)), and I'm confused why it's needed here. Any ideas?
Because I can see the specific point in the XSLT that's inserting the blank line, I presume it's not helpful to provide the XML I'm testing on, but if you do need to see it, you can get it at https://tsdrapi.uspto.gov/ts/cd/casestatus/rn2178784/download.zip ; it's the XML file in that archive.

Use at the beginning of the template the same trick you are using at the end of the template to chop up the stylesheet node tree with empty <xsl:text/> instructions:
<xsl:template match="tm:TradeMark">
<xsl:text/>MarkCurrentStatusDate,"<xsl:value-of select="normalize-space(tm:MarkCurrentStatusDate)"/>"<xsl:text/>

Personally, I think it's cleaner to use a concat() when you need to combine static text and dynamic values:
<xsl:template match="tm:TradeMark">
<xsl:value-of
select="concat(
'MarkCurrentStatusDate,"', normalize-space(tm:MarkCurrentStatusDate), '"',
'ApplicationNumber,"', normalize-space(tm:ApplicationNumber), '"',
'ApplicationDate,"', normalize-space(tm:ApplicationDate), '"',
'RegistrationNumber,"', normalize-space(tm:RegistrationNumber), '"',
'RegistrationDate,"', normalize-space(tm:RegistrationDate), '"'
)"/>
<xsl:apply-templates select="tm:WordMarkSpecification"/>
<xsl:apply-templates select="tm:TradeMarkExt"/>
<xsl:apply-templates select="tm:PublicationDetails"/>
<xsl:apply-templates select="tm:RepresentativeDetails"/>
</xsl:template>
This should also solve your issue with the blank spaces showing up.

Remember you can always just play with the XML syntax to ignore end-of-line sequences that are inside of start and end tag delimiters:
<xsl:template match="tm:TradeMark"
>MarkCurrentStatusDate,"<xsl:value-of select="normalize-space(tm:MarkCurrentStatusDate)"
/>"ApplicationNumber,"<xsl:value-of select="normalize-space(tm:ApplicationNumber)"
/>"ApplicationDate,"<xsl:value-of select="normalize-space(tm:ApplicationDate)"
/>"RegistrationNumber,"<xsl:value-of select="normalize-space(tm:RegistrationNumber)"
/>"RegistrationDate,"<xsl:value-of select="normalize-space(tm:RegistrationDate)"
/>"<xsl:apply-templates select="tm:WordMarkSpecification"/>
<xsl:apply-templates select="tm:TradeMarkExt"/>
<xsl:apply-templates select="tm:PublicationDetails"/>
<xsl:apply-templates select="tm:RepresentativeDetails"/>
</xsl:template>
There is no rule in XML that a tag's closing delimiter /> has to be on the same line as the tag's opening delimiter <. White-space inside of a tag is ignored (where innocuous), and an end-of-line sequence is considered white-space.

If you want all of the text to be emitted without extra line breaks or white-space, then put the literal text inside of <xsl:text> elements.
<xsl:template match="tm:TradeMark">
<xsl:text>MarkCurrentStatusDate,"</xsl:text>
<xsl:value-of select="normalize-space(tm:MarkCurrentStatusDate)"/>
<xsl:text>"</xsl:text>
<xsl:text>ApplicationNumber,"</xsl:text>
<xsl:value-of select="normalize-space(tm:ApplicationNumber)"/>
<xsl:text>"</xsl:text>
<xsl:text>ApplicationDate,"</xsl:text>
<xsl:value-of select="normalize-space(tm:ApplicationDate)"/>
<xsl:text>"</xsl:text>
<xsl:text>RegistrationNumber,"</xsl:text>
<xsl:value-of select="normalize-space(tm:RegistrationNumber)"/>
<xsl:text>"</xsl:text>
<xsl:text>RegistrationDate,"</xsl:text>
<xsl:value-of select="normalize-space(tm:RegistrationDate)"/>
<xsl:text>"</xsl:text>
<xsl:apply-templates select="tm:WordMarkSpecification"/>
<xsl:apply-templates select="tm:TradeMarkExt"/>
<xsl:apply-templates select="tm:PublicationDetails"/>
<xsl:apply-templates select="tm:RepresentativeDetails"/>
</xsl:template>
That way, none of the line breaks and white-space inside of the <xsl:template> will be seen as significant and will not be included in the result tree output.

Related

XSLT - replace specific content of the text() node with a new node

I have a xml like this,
<doc>
<p>Biological<sub>89</sub> bases<sub>4456</sub> for<sub>8910</sub> sexual<sub>4456</sub>
differences<sub>8910</sub> in<sub>4456</sub> the brain exist in a wide range of
vertebrate species, including chickens<sub>8910</sub> Recently<sub>8910</sub> the
dogma<sub>8910</sub> of<sub>4456</sub> hormonal dependence for the sexual
differentiation of the brain has been challenged.</p>
</doc>
As you can see there are <sub> nodes and text() node contains inside the <p> node. and every <sub> node end, there is a text node, starting with a space. (eg: <sub>89</sub> bases : here before 'bases' text appear there is a space exists.) I need to replace those specific spaces with nodes.
SO the expected output should look like this,
<doc>
<p>Biological<sub>89</sub><s/>bases<sub>4456</sub><s/>for<sub>8910</sub><s/>sexual<sub>4456</sub>
<s/>differences<sub>8910</sub><s/>in<sub>4456</sub><s/>the brain exist in a wide range of
vertebrate species, including chickens<sub>8910</sub><s/>Recently<sub>8910</sub><s/>the
dogma<sub>8910</sub><s/>of<sub>4456</sub><s/>hormonal dependence for the sexual
differentiation of the brain has been challenged.</p>
</doc>
to do this I can use regular expression like this,
<xsl:template match="p/text()">
<xsl:analyze-string select="." regex="( )">
<xsl:matching-substring>
<xsl:choose>
<xsl:when test="regex-group(1)">
<s/>
</xsl:when>
</xsl:choose>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
But this adds <s/> nodes to every spaces in the text() node. But I only need thi add nodes to that specific spaces.
Can anyone suggest me a method how can I do this..
If you only want to match text nodes that start with a space and are preceded by a sub element, you can put the condition in your template match
<xsl:template match="p/text()[substring(., 1, 1) = ' '][preceding-sibling::node()[1][self::sub]]">
And if you just want to remove the space at the start of the string, a simple replace will do.
<xsl:value-of select="replace(., '^\s+', '')" />
Try this XSLT
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="xml" indent="no" />
<xsl:template match="p/text()[substring(., 1, 1) = ' '][preceding-sibling::node()[1][self::sub]]">
<s />
<xsl:value-of select="replace(., '^\s+', '')" />
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Just change the regex like so ^( ): it will match only the spaces at the beginning of the text part.
With this XSL snipped:
<xsl:analyze-string select="." regex="^( )">
Here is the result I obtain:
<p>Biological<sub>89</sub><s></s>bases<sub>4456</sub><s></s>for<sub>8910</sub><s></s>sexual<sub>4456</sub>
differences<sub>8910</sub><s></s>in<sub>4456</sub><s></s>the brain exist in a wide range of
vertebrate species, including chickens<sub>8910</sub><s></s>Recently<sub>8910</sub><s></s>the
dogma<sub>8910</sub><s></s>of<sub>4456</sub><s></s>hormonal dependence for the sexual
differentiation of the brain has been challenged.
</p>

parsing string in xslt

I have following xml
<xml>
<xref>
is determined “in prescribed manner”
</xref>
</xml>
I want to see if we can process xslt 2 and return the following result
<xml>
<xref>
is
</xref>
<xref>
determined
</xref>
<xref>
“in prescribed manner”
</xref>
</xml>
I tried few options like replace the space and entities and then using for-each loop but not able to work it out. May be we can use tokenize function of xslt 2.0 but don't know how to use it. Any hint will be helpful.
# JimGarrison: Sorry, I couldn't resist. :-) This XSLT is definitely not elegant but it does (I assume) most of the job:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />
<xsl:variable name="left_quote" select="'<'"/>
<xsl:variable name="right_quote" select="'>'"/>
<xsl:template name="protected_tokenize">
<xsl:param name="string"/>
<xsl:variable name="pattern" select="concat('^([^', $left_quote, ']+)(', $left_quote, '[^', $right_quote, ']*', $right_quote,')?(.*)')"/>
<xsl:analyze-string select="$string" regex="{$pattern}">
<xsl:matching-substring>
<!-- Handle the prefix of the string up to the first opening quote by "normal" tokenizing. -->
<xsl:variable name="prefix" select="concat(' ', normalize-space(regex-group(1)))"/>
<xsl:for-each select="tokenize(normalize-space($prefix), ' ')">
<xref>
<xsl:value-of select="."/>
</xref>
</xsl:for-each>
<!-- Handle the text between the quotes by simply passing it through. -->
<xsl:variable name="protected_token" select="normalize-space(regex-group(2))"/>
<xsl:if test="$protected_token != ''">
<xref>
<xsl:value-of select="$protected_token"/>
</xref>
</xsl:if>
<!-- Handle the suffix of the string. This part may contained protected tokens again. So we do it recursively. -->
<xsl:variable name="suffix" select="normalize-space(regex-group(3))"/>
<xsl:if test="$suffix != ''">
<xsl:call-template name="protected_tokenize">
<xsl:with-param name="string" select="$suffix"/>
</xsl:call-template>
</xsl:if>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:template>
<xsl:template match="*|#*">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="xref">
<xsl:call-template name="protected_tokenize">
<xsl:with-param name="string" select="text()"/>
</xsl:call-template>
</xsl:template>
</xsl:stylesheet>
Notes:
There is the general assumption that white space only serves as a token delimiter and need not be preserved.
“ and rdquo; seem to be invalid in XML although they are valid in HTML. In the XSLT there are variables defined holding the quote characters. They will have to be adapted once you find the right XML representation. You can also eliminate the variables and put the characters right into the regular expression pattern. It will be significantly simplified by this.
<xsl:analyze-string> does not allow a regular expression which may evaluate into an empty string. This comes as a little problem since either the prefix and/or the proteced token and/or the suffix may be empty. I take care of this by artificially adding a space at the beginning of the pattern which allows me to search for the prefix using + (at least one occurence) instead of * (zero or more occurences).

Check for CR or LF in XSLT

I have a input XML like this :
<in_xml>
<company>
<project>
ProjNo1
ProjNo2
ProjNo3
</project>
</company>
</in_xml>
A simple XSLT is applied to this source, which writes another XML with the value of Project Tag.
The Project tag in input xml has three lines , it could be one or more line(s). I am looking at way for the XSLT to read only the first line, in case there are more than one and write the first line in the output xml.
The current XSLT is very simple as it just reads the Project tag and spits out the value, hence the code is not attached.
Regards.
I have added the answer to the question, see below #Maestro's answer.
If you are in the happy circumstances of being able to apply XSLT 2.0, the following may help:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="//project/text()">
<xsl:value-of select="tokenize(normalize-space(.),' ')[1]" />
</xsl:template>
</xsl:stylesheet>
Explanation: first normalize-space() to replace all whitespace strings by a single blank (and cut off leading and trailing whitespace), then split into words, then take the first one.
In XSLT 1.0 you could use
<xsl:value-of select="substring-before(normalize-space(.), ' ')"/>
instead. Less flexible if the second word has to be selected, but for the first word it works OK.
EDIT
you asked how to retrieve a first line in XSLT 1.0 - problem here is the leading whitespace which may contain a LF so you cannot just substring-before the first LF.
The below can probably be improved upon, but it works fine:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="//project/text()">
<xsl:variable name="afterLeadingWS"
select="substring-after(., substring-before(.,substring-before(normalize-space(.), ' ')))"/>
<xsl:choose>
<xsl:when test="contains($afterLeadingWS, '
')">
<xsl:value-of select="substring-before($afterLeadingWS, '
')"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$afterLeadingWS"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Explanation: first get first word as before, then determine the whitespace before that first word, then get everything after that leading whitespace, then get the first line, which is the string before a LF character. It may just happen that there is no LF except maybe in the leading whitespace, hence the choose function.
first up thanks to #Maestro for taking time out and helping me, appreciate your help.
Here is the code that I used to get the first line from a paragraph of text that has a CR:
<xsl:variable name="projNumber" select="ProjectNumber" />
<xsl:variable name="crlf" select="'
'" />
<xsl:choose>
<xsl:when test="contains($projNumber,$crlf)">
<xsl:value-of select="substring-before($projNumber,$crlf)"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$projNumber"/>
</xsl:otherwise>
</xsl:choose>
This can be written as a function but I don't know how to do it, maybe someone can guide but there you go. A better approach that a colleague of mine suggested is to escape the CR and directly use it in substring function this would avoid all those variables in the first place.
Thanks again.
Here's the code again, now with a function getFirstLine.
Note the addtional namespace that is needed.
Also note that this does require XSLT 2.0 (xsl:function is not available in 1.0).
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:f="http://temp.com/functions">
<xsl:output method="text"/>
<xsl:template match="//project/text()">
<xsl:value-of select="f:getFirstLine(.)"/>
</xsl:template>
<xsl:function name="f:getFirstLine">
<xsl:param name="input"/>
<xsl:variable name="afterLeadingWS" select="substring-after($input, substring-before($input,substring-before(normalize-space($input), ' ')))"/>
<xsl:choose>
<xsl:when test="contains($afterLeadingWS, '
')">
<xsl:value-of select="substring-before($afterLeadingWS, '
')"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$afterLeadingWS"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
</xsl:stylesheet>

Trim whitespace from parent element only

I'd like to trim the leading whitespace inside p tags in XML, so this:
<p> Hey, <em>italics</em> and <em>italics</em>!</p>
Becomes this:
<p>Hey, <em>italics</em> and <em>italics</em>!</p>
(Trimming trailing whitespace won't hurt, but it's not mandatory.)
Now, I know normalize-whitespace() is supposed to do this, but if I try to apply it to the text nodes..
<xsl:template match="text()">
<xsl:text>[</xsl:text>
<xsl:value-of select="normalize-space(.)"/>
<xsl:text>]</xsl:text>
</xsl:template>
...it's applied to each text node (in brackets) individually and sucks them dry:
[Hey,]<em>[italics]</em>[and]<em>[italics]</em>[!]
My XSLT looks basically like this:
<xsl:template match="p">
<xsl:apply-templates/>
</xsl:template>
So is there any way I can let apply-templates complete and then run normalize-space on the output, which should do the right thing?
This stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p//text()[1][generate-id()=
generate-id(ancestor::p[1]
/descendant::text()[1])]">
<xsl:variable name="vFirstNotSpace"
select="substring(normalize-space(),1,1)"/>
<xsl:value-of select="concat($vFirstNotSpace,
substring-after(.,$vFirstNotSpace))"/>
</xsl:template>
</xsl:stylesheet>
Output:
<p>Hey, <em>italics</em> and <em>italics</em>!</p>
Edit 2: Better expression (now only three function calls).
Edit 3: Matching the first descendant text node (not just the first node if it's a text node). Thanks to #Dimitre's comment.
Now, with this input:
<p><b> Hey, </b><em>italics</em> and <em>italics</em>!</p>
Output:
<p><b>Hey, </b><em>italics</em> and <em>italics</em>!</p>
I would do something like this:
<xsl:template match="p">
<xsl:apply-templates/>
</xsl:template>
<!-- strip leading whitespace -->
<xsl:template match="p/node()[1][self::text()]">
<xsl:call-template name="left-trim">
<xsl:with-param name="s" value="."/>
</xsl:call-template>
</xsl:template>
This will strip left space from the initial node child of a <p> element, if it is a text node. It will not strip space from the first text node child, if it is not the first node child. E.g. in
<p><em>Hey</em> there</p>
I intentionally avoid stripping the space from the front of 'there', because that would make the words run together when rendered in a browser. If you did want to strip that space, change the match pattern to
match="p/text()[1]"
If you also want to strip trailing whitespace, as your title possibly implies, add these two templates:
<!-- strip trailing whitespace -->
<xsl:template match="p/node()[last()][self::text()]">
<xsl:call-template name="right-trim">
<xsl:with-param name="s" value="."/>
</xsl:call-template>
</xsl:template>
<!-- strip leading/trailing whitespace on sole text node -->
<xsl:template match="p/node()[position() = 1 and
position() = last()][self::text()]"
priority="2">
<xsl:value-of select="normalize-space(.)"/>
</xsl:template>
The definitions of the left-trim and right-trim templates are at Trim Template for XSLT (untested). They might be slow for documents with lots of <p>s. If you can use XSLT 2.0, you can replace the call-templates with
<xsl:value-of select="replace(.,'^\s+','')" />
and
<xsl:value-of select="replace(.,'\s+$','')" />
(Thanks to Priscilla Walmsley.)
You want:
<xsl:template match="text()">
<xsl:value-of select=
"substring(
substring(normalize-space(concat('[',.,']')),2),
1,
string-length(.)
)"/>
</xsl:template>
This wraps the string in "[]", then performs normalize-string(), then finally removes the wrapping characters.

Producing a new line in XSLT

I want to produce a newline for text output in XSLT. Any ideas?
The following XSL code will produce a newline (line feed) character:
<xsl:text>
</xsl:text>
For a carriage return, use:
<xsl:text>
</xsl:text>
My favoured method for doing this looks something like:
<xsl:stylesheet>
<xsl:output method='text'/>
<xsl:variable name='newline'><xsl:text>
</xsl:text></xsl:variable>
<!-- note that the layout there is deliberate -->
...
</xsl:stylesheet>
Then, whenever you want to output a newline (perhaps in csv) you can output something like the following:
<xsl:value-of select="concat(elem1,elem2,elem3,$newline)" />
I've used this technique when outputting sql from xml input. In fact, I tend to create variables for commas, quotes and newlines.
Include the attribute Method="text" on the xsl:output tag and include newlines in your literal content in the XSL at the appropriate points. If you prefer to keep the source code of your XSL tidy use the entity
where you want a new line.
You can use: <xsl:text>
</xsl:text>
see the example
<xsl:variable name="module-info">
<xsl:value-of select="#name" /> = <xsl:value-of select="#rev" />
<xsl:text>
</xsl:text>
</xsl:variable>
if you write this in file e.g.
<redirect:write file="temp.prop" append="true">
<xsl:value-of select="$module-info" />
</redirect:write>
this variable will produce a new line infile as:
commons-dbcp_commons-dbcp = 1.2.2
junit_junit = 4.4
org.easymock_easymock = 2.4
IMHO no more info than #Florjon gave is needed. Maybe some small details are left to understand why it might not work for us sometimes.
First of all, the &#xa (hex) or &#10 (dec) inside a <xsl:text/> will always work, but you may not see it.
There is no newline in a HTML markup. Using a simple <br/> will do fine. Otherwise you'll see a white space. Viewing the source from the browser will tell you what really happened. However, there are cases you expect this behaviour, especially if the consumer is not directly a browser. For instance, you want to create an HTML page and view its structure formatted nicely with empty lines and idents before serving it to the browser.
Remember where you need to use disable-output-escaping and where you don't. Take the following example where I had to create an xml from another and declare its DTD from a stylesheet.
The first version does escape the characters (default for xsl:text)
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes" encoding="utf-8"/>
<xsl:template match="/">
<xsl:text><!DOCTYPE Subscriptions SYSTEM "Subscriptions.dtd">
</xsl:text>
<xsl:copy>
<xsl:apply-templates select="*" mode="copy"/>
</xsl:copy>
</xsl:template>
<xsl:template match="#*|node()" mode="copy">
<xsl:copy>
<xsl:apply-templates select="#*|node()" mode="copy"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
and here is the result:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE Subscriptions SYSTEM "Subscriptions.dtd">
<Subscriptions>
<User id="1"/>
</Subscriptions>
Ok, it does what we expect, escaping is done so that the characters we used are displayed properly. The XML part formatting inside the root node is handled by ident="yes". But with a closer look we see that the newline character &#xa was not escaped and translated as is, performing a double linefeed! I don't have an explanation on this, will be good to know. Anyone?
The second version does not escape the characters so they're producing what they're meant for. The change made was:
<xsl:text disable-output-escaping="yes"><!DOCTYPE Subscriptions SYSTEM "Subscriptions.dtd">
</xsl:text>
and here is the result:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE Subscriptions SYSTEM "Subscriptions.dtd">
<Subscriptions>
<User id="1"/>
</Subscriptions>
and that will be ok. Both cr and lf are properly rendered.
Don't forget we're talking about nl, not crlf (nl=lf). My first attempt was to use only cr:&#xd and while the output xml was validated by DOM properly.
I was viewing a corrupted xml:
<?xml version="1.0" encoding="utf-8"?>
<Subscriptions>riptions SYSTEM "Subscriptions.dtd">
<User id="1"/>
</Subscriptions>
DOM parser disregarded control characters but the rendered didn't. I spent quite some time bumping my head before I realised how silly I was not seeing this!
For the record, I do use a variable inside the body with both CRLF just to be 100% sure it will work everywhere.
You can try,
<xsl:text>
</xsl:text>
It will work.
I added the DOCTYPE directive you see here:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [
<!ENTITY nl "
">
]>
<xsl:stylesheet xmlns:x="http://www.w3.org/2005/02/query-test-XQTSCatalog"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
This allows me to use &nl; instead of
to produce a newline in the output. Like other solutions, this is typically placed inside a <xsl:text> tag.
I second Nic Gibson's method, this was
always my favorite:
<xsl:variable name='nl'><xsl:text>
</xsl:text></xsl:variable>
However I have been using the Ant task <echoxml> to
create stylesheets and run them against files. The
task will do attribute value templates, e.g. ${DSTAMP} ,
but is also will reformat your xml, so in some
cases, the entity reference is preferable.
<xsl:variable name='nl'><xsl:text>
</xsl:text></xsl:variable>
I have found a difference between literal newlines in <xsl:text> and literal newlines using
.
While literal newlines worked fine in my environment (using both Saxon and the default Java XSLT processor) my code failed when it was executed by another group running in a .NET environment.
Changing to entities (
) got my file generation code running consistently on both Java and .NET.
Also, literal newlines are vulnerable to being reformatted by IDEs and can inadvertently get lost when the file is maintained by someone 'not in the know'.
I've noticed from my experience that producing a new line INSIDE a <xsl:variable> clause doesn't work.
I was trying to do something like:
<xsl:variable name="myVar">
<xsl:choose>
<xsl:when test="#myValue != ''">
<xsl:text>My value: </xsl:text>
<xsl:value-of select="#myValue" />
<xsl:text></xsl:text> <!--NEW LINE-->
<xsl:text>My other value: </xsl:text>
<xsl:value-of select="#myOtherValue" />
</xsl:when>
</xsl:choose>
<xsl:variable>
<div>
<xsl:value-of select="$myVar"/>
</div>
Anything I tried to put in that "new line" (the empty <xsl:text> node) just didn't work (including most of the simpler suggestions in this page), not to mention the fact that HTML just won't work there, so eventually I had to split it to 2 variables, call them outside the <xsl:variable> scope and put a simple <br/> between them, i.e:
<xsl:variable name="myVar1">
<xsl:choose>
<xsl:when test="#myValue != ''">
<xsl:text>My value: </xsl:text>
<xsl:value-of select="#myValue" />
</xsl:when>
</xsl:choose>
<xsl:variable>
<xsl:variable name="myVar2">
<xsl:choose>
<xsl:when test="#myValue != ''">
<xsl:text>My other value: </xsl:text>
<xsl:value-of select="#myOtherValue" />
</xsl:when>
</xsl:choose>
<xsl:variable>
<div>
<xsl:value-of select="$myVar1"/>
<br/>
<xsl:value-of select="$myVar2"/>
</div>
Yeah, I know, it's not the most sophisticated solution but it works, just sharing my frustration experience with XSLs ;)
I couldn't just use the <xsl:text>
</xsl:text> approach because if I format the XML file using XSLT the entity will disappear. So I had to use a slightly more round about approach using variables
<xsl:variable name="nl" select="'
'"/>
<xsl:template match="/">
<xsl:value-of select="$nl" disable-output-escaping="no"/>
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:text xml:space="preserve">
</xsl:text>
just add this tag:
<br/>
it works for me ;) .