XSL disable-output-escaping removes whitespaces - xslt

Part of the XML:
<text><b>Title</b> <b>Happy</b></text>
In my XSL I have:
<xsl:value-of select="text" disable-output-escaping="yes" />
My output becomes
**TitleHappy**
My spacing went missing - there's supposed to be a space between </b> and <b>.
I tried normalize-space(), it doesn't work.
Any suggestions? Thanks!

if you want whitespace from an xsl, use:
<xsl:text> </xsl:text>
whitespace is only preserved if its recognized as a text node (ie: " a " both spaces will be recognized)
whitespace from the orignal source xml has to be preserved by telling the parser (for example)
parser.setPreserveWhitespace(true);

As your outputting HTML you could substitute your space with a non-breaking space

Do you have any control over the generation of the original XML? If so, you could try adding xml:space="preserve" to the text element which should tell the processor to keep the whitespace.
<text xml:space="preserve"><b>Title</b> <b>Happy</b></text>
Alternatively, try looking at the "xsl:preserve-space" element in XSLT.
<xsl:preserve-space elements="text"/>
Although I have never used this personally, it might of some help. See W3Schools for more information.

thank you for everyone's input.
Currently I am using MattH suggestion which is to test for space and substitue to non-breaking space. Another method I thought of is to test for "</b> <b>" and substitue with " </b><b>". The space contain within a bold tags are actually output. Both methods worked. Don't know what the implications are though. And I still can't figure out why the spacing is removed when it is found between 2 seperate bold tags.

Related

How to strip Unicode soft hyphen from PDF bookmarks generated using XSL-FO

I'm converting DITA maps to PDF using the DITA Open Toolkit 1.7 and RenderX XEP. In the DITA topics, product names are inserted using conrefs. One of my product names is quite long. It caused layout problems when used within tables. Therefore I inserted a soft hyphen into the phrase that is reused via conref:
<ph id="PD_FineReader2Comp">DOXiS4 FineReader2­Components</ph>
This works nicely in the generated pages, but creates a problem in the bookmarks where a symbol is displayed in place of the soft hyphen.
Obviously, this is an encoding problem. It seems that UTF-8 characters are properly handled in PDF content, but not in PDF bookmarks where, according to the following sources, some PDF-16 characters can be used (but I did not understand which ones).
http://partners.adobe.com/public/developer/en/pdf/PDFReference.pdf
http://www.setasign.de/support/tips-and-tricks/use-unicode-in-string-values/
The DITA Open Toolkit seems to create bookmarks from topic titles using this code fragment:
<fo:bookmark>
<xsl:attribute name="internal-destination">
<xsl:call-template name="generate-toc-id"/>
</xsl:attribute>
<xsl:if test="$bookmarkStyle!='EXPANDED'">
<xsl:attribute name="starting-state">hide</xsl:attribute>
</xsl:if>
<fo:bookmark-title>
<xsl:value-of select="normalize-space($topicTitle)"/>
</fo:bookmark-title>
<xsl:apply-templates mode="bookmark"/>
</fo:bookmark>
The XSL stylesheet has version 2.0.
I would like to create an override that removes the offending character. How can I do this?
Is it possible to properly resolve the encoding problem? (Probably not possible).
Are there any XSL functions or attributes which remove whitespace other than space, tab, linefeed, and carriage return?
Or do I need special handling for the soft hyphen?
Small refinement: If you are using XSLT2, will be more efficient than in this context. In XSLT2 you should always prefer xsl:sequence over xsl:value-of
The simple way to do this is to use the translate() function, which can be used to replace certain characters with other characters, or with nothing. It looks like this is the line that outputs the value you want to fix up:
<xsl:value-of select="normalize-space($topicTitle)"/>
So you could simply modify this to:
<xsl:value-of select="translate(normalize-space($topicTitle), '­', '')"/>
to remove all the soft hyphens. If you would like to replace them with spaces or ordinary hyphens, you could do either of the following, respectively:
<xsl:value-of select="translate(normalize-space($topicTitle), '­', ' ')"/>
<xsl:value-of select="translate(normalize-space($topicTitle), '­', '-')"/>

RSS to XHTML using XSLT - How to remove strange characters?

I'm using XSLT to transform RSS files in XHTML.
In order to create a link I use this block of code:
<!-- language: lang-xml -->
<xsl:for-each select="channel/item">
<h3><xsl:value-of select="title"/></h3>
<xsl:value-of select="description"/>
</xsl:for-each>
But the result comes with some unwanted characters:
<!-- language: lang-html -->
<h3><a href="%0A http://site.com/page.htm%0A ">
What am I doing wrong? Thanks in advance for your help.
It looks like the source has URLEncoded line feeds and some whitespace in it. Leading and trailing whitespace can be stripping using the normalize-space() function. The other stuff may be trickier, depending on how regular it is, and which version of XSLT you're using. If the URLs always end in "%0A ", you could do something like:
substring-before(substring-after(link, 'http'), "%")
This will only work all the time if your URLs are never going to have URLEncoded data in them (which might not be a safe assumption). If you're using XSLT 2.0, something like:
normalize-space(replace(link, '%0A', ''))
might work better.

How can I translate ' into an apostrophe in xslt

The relevant parts of the code:
<xsl:variable name="apos">'</xsl:variable>
<xsl:variable name="and" select='"'"' />
<xsl:value-of select="translate(products_name/node(),$and,$apos)"/>
I'm thinking this should be a simple thing and that the above code should work but it doesn't effect the output at all.
(I used variables because names cannot begin within an ampersand and using just an apostrophe brings up a compile error.)
I've tested the code to make sure the translate is working using strings and there are no errors there.
Any help would be greatly appreciated.
You are on the right track, but not yet there: Your problem is, that XSL is a language that itself is written using XML. For all XML languages, the parser automatically decodes XML entities. The XSLT engine only comes afterwards.
As a result, the XSLT engine neither does nor can distinguish whether you wrote ' or ' - it's the same. For your problem, this has two effects:
You have to use a variable containing the apostrope - this is because the apostrophe itself is reserved for string literals in expressions that may contain functions. Even for <xsl:value-of select="translate(products_name/node(),$and,''')"/>, the XML parser transforms the entity into an apostrophe, i.e. <xsl:value-of select="translate(products_name/node(),$and,''')"/>
You have to escape the ampersand used in the string you search for: for the XSL engine, the variable "and" contains the value ', i.e. you are replacing an apostrophe with an apostrophe.
Working solution:
<xsl:variable name="apos">'</xsl:variable>
<xsl:value-of select='translate(text(), "&#039;", $apos)'/>
Technically, there's no difference in any XML between &apos;, ' and ', they're different ways of representing exactly the same thing. Therefore, that translate call shouldn't do anything.
It depends on how you're transforming it, where that output is (attribute value or element?), and how the output is serialized to text, but your problem isn't with your XSLT.

Xslt: Embedding an image in a RSS feed

I am using Umbraco and I need to display an image in a Rss Feed. The feed is generated by Xslt.
Everything works if I do text stuff. Such stuff is technically feasible, but the feed I analyzed had been generated by WordPress.
The challenge is that I have no idea how I can embed within my tag.
I have a variable, say "url", that returns the full url of the underlying image. How can I insert within ? Remember I am using Xslt to achieve the task.
<content:encoded>
<img src="{$url}" />
</content:encoded>
I guess that CDATA must be used, but I am not able to escape correctly illegal characters :(
Thanks for your help.
Roland
roland, you're trying to escape things twice. It's unnecessary (not to mention hideous!) This page shows:
<content:encoded><![CDATA[This is <i>italics</i>.]]></content:encoded>
I.e. they're just escaping the markup inside the <content:encoded> once, and they use CDATA to do that. In your case, CDATA is awkward because you need to substitute $url in the middle. So you could use two CDATA sections wrapped around an <xsl:value-of select="$url" />: (indented for clarity)
<content:encoded>
<![CDATA[<img src="]]>
<xsl:value-of select='$url' />
<![CDATA[">]]>
</content:encoded>
But that would be needlessly verbose. The second CDATA section is unneeded. And we can do better while using the same principle: escape the markup characters (once) that would cause the string to be parsed into a tree. In your case, only the initial < needs to be escaped. You can use < instead of CDATA to escape the <. Put this in your XSLT:
<content:encoded><img src="<xsl:value-of select='$url' />"></content:encoded>
The <xsl:value-of> is not really inside quotes, from XSLT's perspective... those quotes are just the content of text nodes. The <xsl:value-of> works as a normal XSLT instruction.
Change select='$url' to select="concat($siteUrl, photo)" if that's what you need. (I.e. photo is a child element of the context node, and its text value is the name of the image file.)

How do I format text in between xsl:text tags?

I have an xslt sheet with some text similar to below:
<xsl:text>I am some text, and I want to be bold</xsl:text>
I would like some text to be bold, but this doesn't work.
<xsl:text>I am some text, and I want to be <strong>bold<strong></xsl:text>
The deprecated b tag doesn't work either. How do I format text within an xsl:text tag?
Try this:
<fo:inline font-weight="bold"><xsl:text>Bold text</xsl:text></fo:inline>
XSL-FO Tutoria: Inline Text
Formatting
XSL-FO inline Object
You don't. xsl:text can only contain text nodes and <strong> is an element node, not a string that starts with less-than character; XSLT is about creating node trees, not markup. So, you have to do
<xsl:text>I am some text, and I want to be </xsl:text>
<strong>bold<strong>
<xsl:text> </xsl:text>
<xsl:text disable-output-escaping="yes">I want to be <strong>bold<strong> </xsl:text>
The answer for this depends on how much formatting is needed in the content and also where you get content from.
If you have less content and less formatting then you can use what jelovirt suggested
<xsl:text>I am some text, and I want to be </xsl:text>
<strong>bold<strong>
<xsl:text> </xsl:text>
However if your content has large formatting then what David Medinets suggests is better way to do it
<xsl:text disable-output-escaping="yes">
We have some instructions to print on UI. The set of instructions is huge and of course we read those from XML file.
In such cases the above method is easy to use and maintain too. That is because the content is provided by technical writers. They have no knowledge of XSL. They know using HTML tags and they can easily edit the XML file.
the correct way to use the strong tag is
<strong>This text is strong</strong>
not <strong> at the end
Here is the information reference: https://www.w3schools.com/html/html_formatting.asp