Formatting string (Removing leading zeros)

Formatting string (Removing leading zeros) - xslt

I am newbie to xslt. My requirement is to transform xml file into text file as per the business specifications. I am facing an issue with one of the string formatting issue. Please help me out if you have any idea.
Here is the part of input xml data:
"0001295"
Expected result to print into text file:
1295
My main issue is to remove leading Zeros. Please share if you have any logic/function.

Just use this simple expression:
number(.)
Here is a complete example:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="t">
<xsl:value-of select="number(.)"/>
</xsl:template>
</xsl:stylesheet>
When applied on this XML document:
<t>0001295</t>
the wanted, correct result is produced:
1295
II. Use format-number()
format-number(., '#')

There are a couple of ways you can do this. If the value is entirely numeric (for example not a CSV line or part of a product code such as ASN0012345) you can convert from a string to a number and back to a string again :
string(number($value)).
Otherwise just replace the 0's at the start :
replace( $value, '^0*', '' )
The '^' is required (standard regexp syntax) or a value of 001201 will be replaced with 121 (all zero's removed).
Hope that helps.
Dave

Here is one way you could do it in XSLT 1.0.
First, find the first non-zero element, by removing all the zero elements currently in the value
<xsl:variable name="first" select="substring(translate(., '0', ''), 1, 1)" />
Then, you can find the substring-before this first character, and then use substring-after to get the non-zero part after this
<xsl:value-of select="substring-after(., substring-before(., $first))" />
Or, to combine the two statements into one
<xsl:value-of select="substring-after(., substring-before(., substring(translate(., '0', ''), 1, 1)))" />
So, given the following input
<a>00012095Kb</a>
Then using the following XSLT
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/a">
<xsl:value-of select="substring-after(., substring-before(., substring(translate(., '0', ''), 1, 1)))" />
</xsl:template>
</xsl:stylesheet>
The following will be output
12095Kb

As a simple alternative in XSLT 2.0 that can be used with numeric or alpha-numeric input, with or without leading zeros, you might try:
replace( $value, '^0*(..*)', '$1' )
This works because ^0* is greedy and (..*) captures the rest of the input after the last leading zero. $1 refers to the captured group.
Note that an input containing only zeros will output 0.

XSLT 2.0
Remove leading zeros from STRING
<xsl:value-of select="replace( $value, '^0+', '')"/>

You could use a recursive template that will remove the leading zeros:
<xsl:template name="remove-leading-zeros">
<xsl:param name="text"/>
<xsl:choose>
<xsl:when test="starts-with($text,'0')">
<xsl:call-template name="remove-leading-zeros">
<xsl:with-param name="text"
select="substring-after($text,'0')"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$text"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
Invoke it like this:
<xsl:call-template name="remove-leading-zeros">
<xsl:with-param name="text" select="/path/to/node/with/leading/zeros"/>
</xsl:call-template>
</xsl:template>

<xsl:value-of select="number(.) * 1"/>
works for me

All XSLT1 parser, like the popular libXML2's module for XSLT, have the registered functions facility... So, we can suppose to use it. Suppose also that the language that call XSLT, is PHP: see this wikibook about registerPHPFunctions.
The build-in PHP function ltrim can be used in
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fn="http://php.net/xsl">
<xsl:output method="xml" encoding="utf-8" indent="yes"/>
<xsl:template match="test">
show <xsl:value-of select="fn:function('ltrim',string(.),'0')" />",
</xsl:template>
</xsl:stylesheet>
Now imagine a little bit more complex problem, to ltrim a string with more than 1 number, ex. hello 002 and 021, bye.
The solution is the same: use registerPHPFunctions, except to change the build-in function to a user defined one,
function ltrim0_Multi($s) {
return preg_replace('/(^0+|(?<= )0+)(?=[1-9])/','',$s);
}
converts the example into hello 2 and 21, bye.

Related

What is the encoding can be used in XSLT to support only basic Latin alphabet characters?

I am searching for the correct encoding type need to be used in XSLT when process my XML.
My need is:
Output text file do not accept any special characters or UTF8.
Alphabet logic utilized which only support the modern English alphabet is a Latin-based alphabet consisting of 26 letters – the same letters that are found in the Basic modern Latin alphabet.
I tried to use the encoding="ISO 8859-1" , encoding="ISO 8859-15".
Can some one tell me the correct encoding if above are wrong
Thanks,
Jagan

Like #EiríkrÚtlendi suggested in the comments; sanitize/check your output in the XSLT.
You can create a function with a single parameter that checks for an invalid character...
XML Input
<elem>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</elem>
XSLT 2.0
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:so="StackOverflow Example">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="elem">
<xsl:value-of select="so:out(.)"/>
</xsl:template>
<xsl:function name="so:out">
<xsl:param name="str"/>
<xsl:if test="matches($str,'[^\p{L}]')">
<xsl:message terminate="yes">
<xsl:value-of
select="
concat('Invalid character in "',
$str, '".')"
/>
</xsl:message>
</xsl:if>
<xsl:value-of select="$str"/>
</xsl:function>
</xsl:stylesheet>
Text Output
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
If you add any other character to the elem element in the input, you'll get the following message (I added a space to make it fail):
Invalid character in "ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz".
You could also check it character by character...
<xsl:function name="so:out">
<xsl:param name="str"/>
<xsl:for-each select="string-to-codepoints($str)">
<xsl:if test="matches(codepoints-to-string(.),'[^\p{L}]')">
<xsl:message terminate="yes">
<xsl:value-of
select="
concat('Invalid character ("',
codepoints-to-string(.),
'") in "',
$str, '".')"
/>
</xsl:message>
</xsl:if>
</xsl:for-each>
<xsl:value-of select="$str"/>
</xsl:function>
which would produce the message:
Invalid character (" ") in "ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz".

Strip prefix from attribute value

For a project, I'm stuck with XSLT-1.0/XPATH-1.0 and need a fast way to strip a lowercase prefix from attribute values.
Example attribute values are:
"cmdValue1", "gfValue2", "dTestCase3"
The values I need are:
"Value1", "Value2", "TestCase3"
I came up with this XPath expression but it is too slow for my application:
substring(#attr, 1 + string-length(substring-before(translate(#attr, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', '..........................'), '.')))
In essence the above does replace all uppercase chars to dots, then creates a substring from the original attribute value starting from the first found dot position (first uppercase char).
Does anyone know a shorter/faster way to do this in XSLT-1.0/XPATH-1.0?

There are not many functions in XSLT 1.0 which we could use instead, so I tried the following recursive template to avoid the use of the translate function.
Because it is 1.5 times slower, it does not answer your question. I can just avoid someone trying the same thing:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xml:space="default" exclude-result-prefixes="" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="no" indent="yes" />
<xsl:template match="/">
<out>
<xsl:call-template name="removePrefix">
<xsl:with-param name="prefixedName" select="xml/#attrib" />
</xsl:call-template>
</out>
</xsl:template>
<xsl:template name="removePrefix">
<xsl:param name="prefixedName" />
<xsl:choose>
<xsl:when test="substring-before('_abcdefghijklmnopqrstuvwxyz', substring($prefixedName, 1,1))">
<xsl:call-template name="removePrefix">
<xsl:with-param name="prefixedName" select="substring($prefixedName,2)" />
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$prefixedName" />
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>

You don't need to calculate the prefix's length and manually extract the substring. Instead, just directly ask for everything that comes after it:
substring-after(#attr,
substring-before(translate(#attr,
'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'..........................'),
'.'))
This isn't a huge improvement, but it might shave 7-8% (based on some really rough and quick tests).

Check for CR or LF in XSLT

I have a input XML like this :
<in_xml>
<company>
<project>
ProjNo1
ProjNo2
ProjNo3
</project>
</company>
</in_xml>
A simple XSLT is applied to this source, which writes another XML with the value of Project Tag.
The Project tag in input xml has three lines , it could be one or more line(s). I am looking at way for the XSLT to read only the first line, in case there are more than one and write the first line in the output xml.
The current XSLT is very simple as it just reads the Project tag and spits out the value, hence the code is not attached.
Regards.
I have added the answer to the question, see below #Maestro's answer.

If you are in the happy circumstances of being able to apply XSLT 2.0, the following may help:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="//project/text()">
<xsl:value-of select="tokenize(normalize-space(.),' ')[1]" />
</xsl:template>
</xsl:stylesheet>
Explanation: first normalize-space() to replace all whitespace strings by a single blank (and cut off leading and trailing whitespace), then split into words, then take the first one.
In XSLT 1.0 you could use
<xsl:value-of select="substring-before(normalize-space(.), ' ')"/>
instead. Less flexible if the second word has to be selected, but for the first word it works OK.
EDIT
you asked how to retrieve a first line in XSLT 1.0 - problem here is the leading whitespace which may contain a LF so you cannot just substring-before the first LF.
The below can probably be improved upon, but it works fine:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="//project/text()">
<xsl:variable name="afterLeadingWS"
select="substring-after(., substring-before(.,substring-before(normalize-space(.), ' ')))"/>
<xsl:choose>
<xsl:when test="contains($afterLeadingWS, '
')">
<xsl:value-of select="substring-before($afterLeadingWS, '
')"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$afterLeadingWS"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Explanation: first get first word as before, then determine the whitespace before that first word, then get everything after that leading whitespace, then get the first line, which is the string before a LF character. It may just happen that there is no LF except maybe in the leading whitespace, hence the choose function.

first up thanks to #Maestro for taking time out and helping me, appreciate your help.
Here is the code that I used to get the first line from a paragraph of text that has a CR:
<xsl:variable name="projNumber" select="ProjectNumber" />
<xsl:variable name="crlf" select="'
'" />
<xsl:choose>
<xsl:when test="contains($projNumber,$crlf)">
<xsl:value-of select="substring-before($projNumber,$crlf)"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$projNumber"/>
</xsl:otherwise>
</xsl:choose>
This can be written as a function but I don't know how to do it, maybe someone can guide but there you go. A better approach that a colleague of mine suggested is to escape the CR and directly use it in substring function this would avoid all those variables in the first place.
Thanks again.

Here's the code again, now with a function getFirstLine.
Note the addtional namespace that is needed.
Also note that this does require XSLT 2.0 (xsl:function is not available in 1.0).
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:f="http://temp.com/functions">
<xsl:output method="text"/>
<xsl:template match="//project/text()">
<xsl:value-of select="f:getFirstLine(.)"/>
</xsl:template>
<xsl:function name="f:getFirstLine">
<xsl:param name="input"/>
<xsl:variable name="afterLeadingWS" select="substring-after($input, substring-before($input,substring-before(normalize-space($input), ' ')))"/>
<xsl:choose>
<xsl:when test="contains($afterLeadingWS, '
')">
<xsl:value-of select="substring-before($afterLeadingWS, '
')"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$afterLeadingWS"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
</xsl:stylesheet>

How do I use a regular expression in XSLT 1.0?

I am using XSLT 1.0.
My input information may contain these values
<!--case 1-->
<attribute>123-00</attribute>
<!--case 2-->
<attribute>Abc-01</attribute>
<!--case 3-->
<attribute>--</attribute>
<!--case 4-->
<attribute>Z2-p01</attribute>
I want to find out those string that match the criteria:
if string has at least 1 alphabet AND has at least 1 number,
then
do X processing
else
do Y processing
In example above, for case 1,2,4 I should be able to do X processing. For case 3, I should be able to do Y processing.
I aim to use a regular expression (in XSLT 1.0).
For all the cases, the attribute can take any value of any length.
I tried use of match, but the processor returned an error.
I tried use of translate function, but not sure if used the right way.
I am thinking about.
if String matches [a-zA-Z0-9]*
then do X processing
else
do y processing.
How do I implement that using XSLT 1.0 syntax?

This solution really works in XSLT 1.0 (and is simpler, because it doesn't and needn't use the double-translate method.):
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:variable name="vUpper" select=
"'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
<xsl:variable name="vLower" select=
"'abcdefghijklmnopqrstuvwxyz'"/>
<xsl:variable name="vAlpha" select="concat($vUpper, $vLower)"/>
<xsl:variable name="vDigits" select=
"'0123456789'"/>
<xsl:template match="attribute">
<xsl:choose>
<xsl:when test=
"string-length() != string-length(translate(.,$vAlpha,''))
and
string-length() != string-length(translate(.,$vDigits,''))">
Processing X
</xsl:when>
<xsl:otherwise>
Processing Y
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
when applied on the provided XML fragment -- made a well-formed XML document:
<t>
<!--case 1-->
<attribute>123-00</attribute>
<!--case 2-->
<attribute>Abc-01</attribute>
<!--case 3-->
<attribute>--</attribute>
<!--case 4-->
<attribute>Z2-p01</attribute>
</t>
the wanted, correct result is produced:
Processing Y
Processing X
Processing Y
Processing X
Do Note: Any attempt to use with a true XSLT 1.0 processor code like this (borrowed from another answer to this question) will fail with error:
<xsl:template match=
"attribute[
translate(.,
translate(.,
concat($upper, $lower),
''),
'')
and
translate(., translate(., $digit, ''), '')]
">
because in XSLT 1.0 it is forbidden for a match pattern to contain a variable reference.

If you found this question because you're looking for a way to use regular expressions in XSLT 1.0, and you're writing an application using Microsoft's XSLT processor, you can solve this problem by using an inline C# script.
I've written out an example and a few tips in this thread, where someone was seeking out similar functionality. It's super simple, though it may or may not be appropriate for your purposes.

XSLT does not support regular expressions, but you can fake it.
The following stylesheet prints an X processing message for all attribute elements having a string value containing at least one number and at least one letter (and Y processing for those that do not):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:variable name="lower" select="'abcdefghijklmnopqrstuvwxyz'"/>
<xsl:variable name="upper" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
<xsl:variable name="digit" select="'0123456789'"/>
<xsl:template match="attribute">
<xsl:choose>
<xsl:when test="
translate(., translate(., concat($upper, $lower), ''), '') and
translate(., translate(., $digit, ''), '')">
<xsl:message>X processing</xsl:message>
</xsl:when>
<xsl:otherwise>
<xsl:message>Y processing</xsl:message>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Note: You said this:
In example above, for case 1,2,4 I should be able to do X processing.
for case 3, I should be able to do Y processing.
But that conflicts with your requirement, because case 1 does not contain a letter. If, on the other hand, you really want to match the equivalent of [a-zA-Z0-9], then use this:
translate(., translate(., concat($upper, $lower, $digit), ''), '')
...which matches any attribute having at least one letter or number.
See the following question for more information on using translate in this way:
How to write xslt if element contains letters?

Complex XSLT split?

Is it possible to split a tag at lower to upper case boundaries i.e.
for example, tag 'UserLicenseCode' should be converted to 'User License Code'
so that the column headers look a little nicer.
I've done something like this in the past using Perl's regular expressions,
but XSLT is a whole new ball game for me.
Any pointers in creating such a template would be greatly appreciated!
Thanks
Krishna

Using recursion, it is possible to walk through a string in XSLT to evaluate every character. To do this, create a new template which accepts only one string parameter. Check the first character and if it's an uppercase character, write a space. Then write the character. Then call the template again with the remaining characters inside a single string. This would result in what you want to do.
That would be your pointer. I will need some time to work out the template. :-)
It took some testing, especially to get the space inside the whole thing. (I misused a character for this!) But this code should give you an idea...
I used this XML:
<?xml version="1.0" encoding="UTF-8"?>
<blah>UserLicenseCode</blah>
and then this stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:output method="text"/>
<xsl:variable name="Space">*</xsl:variable>
<xsl:template match="blah">
<xsl:variable name="Split">
<xsl:call-template name="Split">
<xsl:with-param name="Value" select="."/>
<xsl:with-param name="First" select="true()"/>
</xsl:call-template></xsl:variable>
<xsl:value-of select="translate($Split, '*', ' ')" />
</xsl:template>
<xsl:template name="Split">
<xsl:param name="Value"/>
<xsl:param name="First" select="false()"/>
<xsl:if test="$Value!=''">
<xsl:variable name="FirstChar" select="substring($Value, 1, 1)"/>
<xsl:variable name="Rest" select="substring-after($Value, $FirstChar)"/>
<xsl:if test="not($First)">
<xsl:if test="translate($FirstChar, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', '..........................')= '.'">
<xsl:value-of select="$Space"/>
</xsl:if>
</xsl:if>
<xsl:value-of select="$FirstChar"/>
<xsl:call-template name="Split">
<xsl:with-param name="Value" select="$Rest"/>
</xsl:call-template>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
and I got this as result:
User License Code
Do keep in mind that spaces and other white-space characters do tend to be stripped away from XML, which is why I used an '*' instead, which I translated to a space.
Of course, this code could be improved. It's what I could come up with in 10 minutes of work. In other languages, it would take less lines of code but in XSLT it's still quite fast, considering the amount of code lines it contains.

An XSLT + FXSL solution (in XSLT 2.0, but almost the same code will work with XSLT 1.0 and FXSL 1.x:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:f="http://fxsl.sf.net/"
xmlns:testmap="testmap"
exclude-result-prefixes="f testmap"
>
<xsl:import href="../f/func-str-dvc-map.xsl"/>
<testmap:testmap/>
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="vTestMap" select="document('')/*/testmap:*[1]"/>
'<xsl:value-of select="f:str-map($vTestMap, 'UserLicenseCode')"
/>'
</xsl:template>
<xsl:template name="mySplit" match="*[namespace-uri() = 'testmap']"
mode="f:FXSL">
<xsl:param name="arg1"/>
<xsl:value-of select=
"if(lower-case($arg1) ne $arg1)
then concat(' ', $arg1)
else $arg1
"/>
</xsl:template>
</xsl:stylesheet>
When the above transformation is applied on any source XML document (not used), the expected correct result is produced:
' User License Code'
Do note:
We are using the DVC version of the FXSL function/template str-map(). This is a Higher-order function (HOF) which takes two arguments: another function and a string. str-map() applies the function on every character of the string and returns the concatenation of the results.
Because the lower-case() function is used (in the XSLT 2.0 version), we are not constrained to only the Latin alphabet.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Formatting string (Removing leading zeros) - xslt

XSLT 2.0 Remove leading zeros from STRING <xsl:value-of select="replace( $value, '^0+', '')"/>

<xsl:value-of select="number(.) * 1"/> works for me

Related

What is the encoding can be used in XSLT to support only basic Latin alphabet characters?

Strip prefix from attribute value

Check for CR or LF in XSLT

How do I use a regular expression in XSLT 1.0?

Complex XSLT split?

Categories

Resources