Removing non-alphanumeric characters from string in XSL - xslt

How do I remove non-alphanumeric characters from a string in XSL?

If you define non-alphanumeric as [^a-zA-Z0-9]:
<xsl:value-of select="
translate(
string,
translate(
string,
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
''
),
''
)
" />
Note that this is for XSLT 1.0. In XSLT 2.0 you can work with regexes directly, using replace().

For XSLT 2.0 you can use replace() as follows:
<xsl:value-of select="replace(<string>, '[^a-zA-Z0-9]', '')" />

Related

Using XSLT, how to translate " with &apos;

Using XSLT, I need to create a CSV report while processing an XML document.
During the procedure, I check if one of the column will contain a comma (,); if yes I put quotes around the value. My problem is that the value may already contains " ("), which will confuse the CSV format.
In XSLT, how can I replace all the " (") in a string with ' (')
I tried using fn:translate:
<xsl:value-of select="**fn:translate(., '"', '#apos;')**"/>
is rejected because it sees it as
<xsl:value-of select="fn:translate(., '"', **'''**)"/>
Any suggestion?
Current XSLT
<xsl:function name="cm:forCSV">
<xsl:param name="node" as="item()*"/>
<xsl:if test="contains($node, ',')">
<xsl:text>"</xsl:text>
</xsl:if>
<xsl:choose>
<xsl:when test="$node instance of xs:string">
<xsl:value-of select="fn:normalize-space(**fn:translate($node, '"', '&apos;')**)"/>
</xsl:when>
<xsl:otherwise>
...
</xsl:otherwise>
</xsl:choose>
<xsl:if test="contains($node, ',')">
<xsl:text>"</xsl:text>
</xsl:if>
</xsl:function>
Sample data:
<sample value="test">an example**,** to be saved in a column
Expected:
<sample value='test'>an example**,** to be saved in a column
To have in my CSV
..., "<sample value='test'>an example**,** to be saved in a column", ...
Since you are using XSLT 2.0 or higher, you can escape apostrophes and quotes by doubling them:
<xsl:value-of select="translate($string, '"', '&apos;&apos;')" />
From the XPath 2.0 specification:
If the literal is delimited by apostrophes, two adjacent apostrophes within the literal are interpreted as a single apostrophe. Similarly, if the literal is delimited by quotation marks, two adjacent quotation marks within the literal are interpreted as one quotation mark.
In XSLT 1.0, you would have to do:
<xsl:variable name="apos" select='"&apos;"'/>
<xsl:value-of select="translate($string, '"', $apos)" />
Another option for referring to quotes, apostrophes, ampersands, etc, in XSLT is to use the codepoints-to-string function to generate them:
<xsl:value-of select="translate(., codepoints-to-string(34), codepoints-to-string(39)"/>
... or define variables using ' to enclose " and vice versa:
<xsl:variable name="quote" select='"'/>
<xsl:variable name="apos" select="'"/>
... or using a text node to define the value:
<xsl:variable name="quote">"</xsl:variable>
<xsl:variable name="apos">'</xsl:variable>
... and then refer to the variables:
<xsl:value-of select="translate(., $quote, $apos)"/>
But actually I'd suggest that rather than translate " to ', you could use a regular expression to find " characters in the input string and double them (i.e. replace each " with "") which is the standard way to include " characters in a quoted CSV field.
<xsl:variable name="quote-regex">/"</xsl:variable>
<xsl:value-of select="replace(., $quote-regex, '$1$1')"/>
In XPath, you can use single quotes to quote a double quote and vice versa:
fn:translate(., '"', "'")
When the XPath expression is the select attribute of an XML element, you can put it in single quotes and escape the single quotes in the expression as &apos;, or similarly put it in double quotes:
<xsl:value-of select="fn:translate(., '"', "'")"/>

xslt 1.0 template that reduces multiple spaces to a single space

In my XSLT 2.0 stylesheet, I use the following template reduces multiple spaces to a single space.
<xsl:template match="text()">
<xsl:value-of select="replace(., '\s+', ' ')"/>
</xsl:template>
I'd like to do the same thing in a XSLT 1.0 stylesheet, but the "replace" function is not supported. Any suggestions for what I can do?
You could use normalize-space():
<xsl:template match="text()">
<xsl:value-of select="normalize-space()"/>
</xsl:template>
This will remove any leading and trailing whitespace and reduce multiple spaces to a single space.
For reference: https://developer.mozilla.org/en-US/docs/Web/XPath/Functions/normalize-space

Formatting string (Removing leading zeros)

I am newbie to xslt. My requirement is to transform xml file into text file as per the business specifications. I am facing an issue with one of the string formatting issue. Please help me out if you have any idea.
Here is the part of input xml data:
"0001295"
Expected result to print into text file:
1295
My main issue is to remove leading Zeros. Please share if you have any logic/function.
Just use this simple expression:
number(.)
Here is a complete example:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="t">
<xsl:value-of select="number(.)"/>
</xsl:template>
</xsl:stylesheet>
When applied on this XML document:
<t>0001295</t>
the wanted, correct result is produced:
1295
II. Use format-number()
format-number(., '#')
There are a couple of ways you can do this. If the value is entirely numeric (for example not a CSV line or part of a product code such as ASN0012345) you can convert from a string to a number and back to a string again :
string(number($value)).
Otherwise just replace the 0's at the start :
replace( $value, '^0*', '' )
The '^' is required (standard regexp syntax) or a value of 001201 will be replaced with 121 (all zero's removed).
Hope that helps.
Dave
Here is one way you could do it in XSLT 1.0.
First, find the first non-zero element, by removing all the zero elements currently in the value
<xsl:variable name="first" select="substring(translate(., '0', ''), 1, 1)" />
Then, you can find the substring-before this first character, and then use substring-after to get the non-zero part after this
<xsl:value-of select="substring-after(., substring-before(., $first))" />
Or, to combine the two statements into one
<xsl:value-of select="substring-after(., substring-before(., substring(translate(., '0', ''), 1, 1)))" />
So, given the following input
<a>00012095Kb</a>
Then using the following XSLT
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/a">
<xsl:value-of select="substring-after(., substring-before(., substring(translate(., '0', ''), 1, 1)))" />
</xsl:template>
</xsl:stylesheet>
The following will be output
12095Kb
As a simple alternative in XSLT 2.0 that can be used with numeric or alpha-numeric input, with or without leading zeros, you might try:
replace( $value, '^0*(..*)', '$1' )
This works because ^0* is greedy and (..*) captures the rest of the input after the last leading zero. $1 refers to the captured group.
Note that an input containing only zeros will output 0.
XSLT 2.0
Remove leading zeros from STRING
<xsl:value-of select="replace( $value, '^0+', '')"/>
You could use a recursive template that will remove the leading zeros:
<xsl:template name="remove-leading-zeros">
<xsl:param name="text"/>
<xsl:choose>
<xsl:when test="starts-with($text,'0')">
<xsl:call-template name="remove-leading-zeros">
<xsl:with-param name="text"
select="substring-after($text,'0')"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$text"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
Invoke it like this:
<xsl:call-template name="remove-leading-zeros">
<xsl:with-param name="text" select="/path/to/node/with/leading/zeros"/>
</xsl:call-template>
</xsl:template>
<xsl:value-of select="number(.) * 1"/>
works for me
All XSLT1 parser, like the popular libXML2's module for XSLT, have the registered functions facility... So, we can suppose to use it. Suppose also that the language that call XSLT, is PHP: see this wikibook about registerPHPFunctions.
The build-in PHP function ltrim can be used in
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fn="http://php.net/xsl">
<xsl:output method="xml" encoding="utf-8" indent="yes"/>
<xsl:template match="test">
show <xsl:value-of select="fn:function('ltrim',string(.),'0')" />",
</xsl:template>
</xsl:stylesheet>
Now imagine a little bit more complex problem, to ltrim a string with more than 1 number, ex. hello 002 and 021, bye.
The solution is the same: use registerPHPFunctions, except to change the build-in function to a user defined one,
function ltrim0_Multi($s) {
return preg_replace('/(^0+|(?<= )0+)(?=[1-9])/','',$s);
}
converts the example into hello 2 and 21, bye.

Strip leading spaces only

Given element:
<comments> comments
go here
</comments>
How can I strip what may be multiple leading space characters. I cannot use normalize space because I need to retain newlines and such. XSLT 2.0 ok.
In XPath 1.0 (means XSLT 1.0, too):
substring($input,
string-length(
substring-before($input,
substring(translate($input, ' ', ''),
1,
1)
)
) +1
)
Wrapped in an XSLT transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:variable name="input"
select="string(/*/text())"/>
<xsl:template match="/">
'<xsl:value-of select=
"substring($input,
string-length(
substring-before($input,
substring(translate($input, ' ', ''),
1,
1)
)
) +1
)
"/>'
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the following XML document:
<t> XXX YYY Z</t>
the correct, wanted result is produced:
'XXX YYY Z'
Use the replace() function:
replace($input,'^ +','')
That handles leading space characters only up to the first non-space. If you want to remove all leading whitespace characters (i.e. space, nl, cr, tab) up to the first non-whitespace, use:
replace($input,'^\s+','')

XSLT: Regular Expression function does not work?

Ok, this one has been driving me up the wall...
I have a xslt function that is supposed to split out the Zip-code part from a Zip+City string depending on the country. I cannot get it to work! This is what I got so far:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt="http://exslt.org/functions" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:function name="exslt:GetZip" as="xs:string">
<xsl:param name="zipandcity" as="xs:string"/>
<xsl:param name="countrycode" as="xs:string"/>
<xsl:choose>
<xsl:when test="$countrycode='DK'">
<xsl:analyze-string select="$zipandcity" regex="(\d{4}) ([A-Za-zÆØÅæøå]{3,24})">
<xsl:matching-substring>
<xsl:value-of select="regex-group(1)"/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:text>fail</xsl:text>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:when>
<xsl:otherwise>
<xsl:text>error</xsl:text>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
I am running it on a source XML where the following values are passed to the function:
zipandcity: "DK-2640 København SV"
countrycode: "DK"
...will output 'fail'!
I think there is something I am misunderstanding here...
Aside from that facts that regexes aren't supported until XSLT 2.0 and braces have to be escaped (but backslashes don't), there's one more reason why that code won't work: XSLT regexes are implicitly anchored at both ends. Given the string DK-2640 København SV, your regex only matches 2640 København, so you need to "pad" it to make it consume the whole string:
regex=".*(\d{{4}}) ([A-Za-zÆØÅæøå]{{3,24}}).*"
.* is probably sufficient in this case, but sometimes you have to be more specific. For example, if there's more than one place where \d{4} could match, you might use \D* at the beginning to make sure the first capturing group matches the first bunch of digits.
The regex attribute is parsed as an attribute value template whery curly braces have a special meaning. If this is in fact an XSL 2.0 Stylesheet, you need to escape the curly braces in the regex attribute by doubling them: (\d{{4}}) ([A-Za-zÆØÅæøå]{{3,24}})
Alternatively you could define a variable containing your pattern like this:
<xsl:variable name="pattern">(\d{4}) ([A-Za-zÆØÅæøå]{3,24})</xsl:variable
<xsl:analyze-string select="$zipandcity" regex="{$pattern}">
Regular expressions are only supported in XSLT 2.x -- not in XSLT 1.0.