How to avoid comma in comma delimited CSV with using XSLT - xslt

Im using xslt for csv files. Sometimes there is a text like "a, b" in xml. When you choose comma delimited in excel for showing csv file, excel thinks that they have to separated to columns. But I want to do just for column. Is there a way to do that in xslt part?

In order to preserve spaces you need to quote values which contain commas. It is possible in XSLT, but the answer depends on your stylesheet design. If you want a more accurate answer, please share your code. Generally, you can use the following template to wrap any text nodes of interest in quotes:
<xsl:template match="text()[contains(., ', ')]">
<xsl:value-of select="concat('"', ., '"')"/>
</xsl:template>

You can get clues from the open-source CSV to XML package in XSLT 2.0 I've published in the "Free Developer" section of my web site: http://www.CraneSoftwrights.com/resources/#csv it follows RFC4180 http://www.ietf.org/rfc/rfc4180.txt
The idea is to look for quotes first and then commas when there aren't quotes. This can be expressed in regex as I have done in the code I've cited.

Related

How to remove dashes in numbers WITHOUT removing leading zeros

The input (account numbers) I have are currently in the format 005-947864-296, I'm using the translate function to remove dashes as follows: <xsl:value-of select="translate(($account_number), '-', '')"/> The problem is that the output I'm getting in the csv is 5947864296 (which is removing Leading Zeros). How do I remove the dashes WITHOUT removing the leading zeros?
I'm using XSLT 2.0 and I tried both translate and replace functions but getting the same result!
Perhaps you are viewing the generated CSV by loading it into a spreadsheet program such as Microsoft Excel? Excel (notoriously) assumes that if a field is all-numeric, leading zeroes are insignificant and can be discarded (which is not the case for things such as account numbers).
The problem isn't with your XSLT code generating the CSV, it's with the application you are using to read/process the CSV.
If you're using XSLT 2.0 or later, you can use the replace function with a regex:
<xsl:value-of select="replace($account_number, '-', '')"/>
If you're using XSLT 1.0, you can use the translate function with a character map:
<xsl:value-of select="translate($account_number, '-0123456789', '0123456789')"/>
This will replace all dashes with zeros, but will preserve any leading zeros.

How to strip Unicode soft hyphen from PDF bookmarks generated using XSL-FO

I'm converting DITA maps to PDF using the DITA Open Toolkit 1.7 and RenderX XEP. In the DITA topics, product names are inserted using conrefs. One of my product names is quite long. It caused layout problems when used within tables. Therefore I inserted a soft hyphen into the phrase that is reused via conref:
<ph id="PD_FineReader2Comp">DOXiS4 FineReader2­Components</ph>
This works nicely in the generated pages, but creates a problem in the bookmarks where a symbol is displayed in place of the soft hyphen.
Obviously, this is an encoding problem. It seems that UTF-8 characters are properly handled in PDF content, but not in PDF bookmarks where, according to the following sources, some PDF-16 characters can be used (but I did not understand which ones).
http://partners.adobe.com/public/developer/en/pdf/PDFReference.pdf
http://www.setasign.de/support/tips-and-tricks/use-unicode-in-string-values/
The DITA Open Toolkit seems to create bookmarks from topic titles using this code fragment:
<fo:bookmark>
<xsl:attribute name="internal-destination">
<xsl:call-template name="generate-toc-id"/>
</xsl:attribute>
<xsl:if test="$bookmarkStyle!='EXPANDED'">
<xsl:attribute name="starting-state">hide</xsl:attribute>
</xsl:if>
<fo:bookmark-title>
<xsl:value-of select="normalize-space($topicTitle)"/>
</fo:bookmark-title>
<xsl:apply-templates mode="bookmark"/>
</fo:bookmark>
The XSL stylesheet has version 2.0.
I would like to create an override that removes the offending character. How can I do this?
Is it possible to properly resolve the encoding problem? (Probably not possible).
Are there any XSL functions or attributes which remove whitespace other than space, tab, linefeed, and carriage return?
Or do I need special handling for the soft hyphen?
Small refinement: If you are using XSLT2, will be more efficient than in this context. In XSLT2 you should always prefer xsl:sequence over xsl:value-of
The simple way to do this is to use the translate() function, which can be used to replace certain characters with other characters, or with nothing. It looks like this is the line that outputs the value you want to fix up:
<xsl:value-of select="normalize-space($topicTitle)"/>
So you could simply modify this to:
<xsl:value-of select="translate(normalize-space($topicTitle), '­', '')"/>
to remove all the soft hyphens. If you would like to replace them with spaces or ordinary hyphens, you could do either of the following, respectively:
<xsl:value-of select="translate(normalize-space($topicTitle), '­', ' ')"/>
<xsl:value-of select="translate(normalize-space($topicTitle), '­', '-')"/>

How can I translate ' into an apostrophe in xslt

The relevant parts of the code:
<xsl:variable name="apos">'</xsl:variable>
<xsl:variable name="and" select='"'"' />
<xsl:value-of select="translate(products_name/node(),$and,$apos)"/>
I'm thinking this should be a simple thing and that the above code should work but it doesn't effect the output at all.
(I used variables because names cannot begin within an ampersand and using just an apostrophe brings up a compile error.)
I've tested the code to make sure the translate is working using strings and there are no errors there.
Any help would be greatly appreciated.
You are on the right track, but not yet there: Your problem is, that XSL is a language that itself is written using XML. For all XML languages, the parser automatically decodes XML entities. The XSLT engine only comes afterwards.
As a result, the XSLT engine neither does nor can distinguish whether you wrote ' or ' - it's the same. For your problem, this has two effects:
You have to use a variable containing the apostrope - this is because the apostrophe itself is reserved for string literals in expressions that may contain functions. Even for <xsl:value-of select="translate(products_name/node(),$and,''')"/>, the XML parser transforms the entity into an apostrophe, i.e. <xsl:value-of select="translate(products_name/node(),$and,''')"/>
You have to escape the ampersand used in the string you search for: for the XSL engine, the variable "and" contains the value ', i.e. you are replacing an apostrophe with an apostrophe.
Working solution:
<xsl:variable name="apos">'</xsl:variable>
<xsl:value-of select='translate(text(), "&#039;", $apos)'/>
Technically, there's no difference in any XML between &apos;, ' and ', they're different ways of representing exactly the same thing. Therefore, that translate call shouldn't do anything.
It depends on how you're transforming it, where that output is (attribute value or element?), and how the output is serialized to text, but your problem isn't with your XSLT.

Xslt: Embedding an image in a RSS feed

I am using Umbraco and I need to display an image in a Rss Feed. The feed is generated by Xslt.
Everything works if I do text stuff. Such stuff is technically feasible, but the feed I analyzed had been generated by WordPress.
The challenge is that I have no idea how I can embed within my tag.
I have a variable, say "url", that returns the full url of the underlying image. How can I insert within ? Remember I am using Xslt to achieve the task.
<content:encoded>
<img src="{$url}" />
</content:encoded>
I guess that CDATA must be used, but I am not able to escape correctly illegal characters :(
Thanks for your help.
Roland
roland, you're trying to escape things twice. It's unnecessary (not to mention hideous!) This page shows:
<content:encoded><![CDATA[This is <i>italics</i>.]]></content:encoded>
I.e. they're just escaping the markup inside the <content:encoded> once, and they use CDATA to do that. In your case, CDATA is awkward because you need to substitute $url in the middle. So you could use two CDATA sections wrapped around an <xsl:value-of select="$url" />: (indented for clarity)
<content:encoded>
<![CDATA[<img src="]]>
<xsl:value-of select='$url' />
<![CDATA[">]]>
</content:encoded>
But that would be needlessly verbose. The second CDATA section is unneeded. And we can do better while using the same principle: escape the markup characters (once) that would cause the string to be parsed into a tree. In your case, only the initial < needs to be escaped. You can use < instead of CDATA to escape the <. Put this in your XSLT:
<content:encoded><img src="<xsl:value-of select='$url' />"></content:encoded>
The <xsl:value-of> is not really inside quotes, from XSLT's perspective... those quotes are just the content of text nodes. The <xsl:value-of> works as a normal XSLT instruction.
Change select='$url' to select="concat($siteUrl, photo)" if that's what you need. (I.e. photo is a child element of the context node, and its text value is the name of the image file.)

XSL disable-output-escaping removes whitespaces

Part of the XML:
<text><b>Title</b> <b>Happy</b></text>
In my XSL I have:
<xsl:value-of select="text" disable-output-escaping="yes" />
My output becomes
**TitleHappy**
My spacing went missing - there's supposed to be a space between </b> and <b>.
I tried normalize-space(), it doesn't work.
Any suggestions? Thanks!
if you want whitespace from an xsl, use:
<xsl:text> </xsl:text>
whitespace is only preserved if its recognized as a text node (ie: " a " both spaces will be recognized)
whitespace from the orignal source xml has to be preserved by telling the parser (for example)
parser.setPreserveWhitespace(true);
As your outputting HTML you could substitute your space with a non-breaking space
Do you have any control over the generation of the original XML? If so, you could try adding xml:space="preserve" to the text element which should tell the processor to keep the whitespace.
<text xml:space="preserve"><b>Title</b> <b>Happy</b></text>
Alternatively, try looking at the "xsl:preserve-space" element in XSLT.
<xsl:preserve-space elements="text"/>
Although I have never used this personally, it might of some help. See W3Schools for more information.
thank you for everyone's input.
Currently I am using MattH suggestion which is to test for space and substitue to non-breaking space. Another method I thought of is to test for "</b> <b>" and substitue with " </b><b>". The space contain within a bold tags are actually output. Both methods worked. Don't know what the implications are though. And I still can't figure out why the spacing is removed when it is found between 2 seperate bold tags.