Complex XSLT split? - xslt

Is it possible to split a tag at lower to upper case boundaries i.e.
for example, tag 'UserLicenseCode' should be converted to 'User License Code'
so that the column headers look a little nicer.
I've done something like this in the past using Perl's regular expressions,
but XSLT is a whole new ball game for me.
Any pointers in creating such a template would be greatly appreciated!
Thanks
Krishna

Using recursion, it is possible to walk through a string in XSLT to evaluate every character. To do this, create a new template which accepts only one string parameter. Check the first character and if it's an uppercase character, write a space. Then write the character. Then call the template again with the remaining characters inside a single string. This would result in what you want to do.
That would be your pointer. I will need some time to work out the template. :-)
It took some testing, especially to get the space inside the whole thing. (I misused a character for this!) But this code should give you an idea...
I used this XML:
<?xml version="1.0" encoding="UTF-8"?>
<blah>UserLicenseCode</blah>
and then this stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:output method="text"/>
<xsl:variable name="Space">*</xsl:variable>
<xsl:template match="blah">
<xsl:variable name="Split">
<xsl:call-template name="Split">
<xsl:with-param name="Value" select="."/>
<xsl:with-param name="First" select="true()"/>
</xsl:call-template></xsl:variable>
<xsl:value-of select="translate($Split, '*', ' ')" />
</xsl:template>
<xsl:template name="Split">
<xsl:param name="Value"/>
<xsl:param name="First" select="false()"/>
<xsl:if test="$Value!=''">
<xsl:variable name="FirstChar" select="substring($Value, 1, 1)"/>
<xsl:variable name="Rest" select="substring-after($Value, $FirstChar)"/>
<xsl:if test="not($First)">
<xsl:if test="translate($FirstChar, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', '..........................')= '.'">
<xsl:value-of select="$Space"/>
</xsl:if>
</xsl:if>
<xsl:value-of select="$FirstChar"/>
<xsl:call-template name="Split">
<xsl:with-param name="Value" select="$Rest"/>
</xsl:call-template>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
and I got this as result:
User License Code
Do keep in mind that spaces and other white-space characters do tend to be stripped away from XML, which is why I used an '*' instead, which I translated to a space.
Of course, this code could be improved. It's what I could come up with in 10 minutes of work. In other languages, it would take less lines of code but in XSLT it's still quite fast, considering the amount of code lines it contains.

An XSLT + FXSL solution (in XSLT 2.0, but almost the same code will work with XSLT 1.0 and FXSL 1.x:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:f="http://fxsl.sf.net/"
xmlns:testmap="testmap"
exclude-result-prefixes="f testmap"
>
<xsl:import href="../f/func-str-dvc-map.xsl"/>
<testmap:testmap/>
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="vTestMap" select="document('')/*/testmap:*[1]"/>
'<xsl:value-of select="f:str-map($vTestMap, 'UserLicenseCode')"
/>'
</xsl:template>
<xsl:template name="mySplit" match="*[namespace-uri() = 'testmap']"
mode="f:FXSL">
<xsl:param name="arg1"/>
<xsl:value-of select=
"if(lower-case($arg1) ne $arg1)
then concat(' ', $arg1)
else $arg1
"/>
</xsl:template>
</xsl:stylesheet>
When the above transformation is applied on any source XML document (not used), the expected correct result is produced:
' User License Code'
Do note:
We are using the DVC version of the FXSL function/template str-map(). This is a Higher-order function (HOF) which takes two arguments: another function and a string. str-map() applies the function on every character of the string and returns the concatenation of the results.
Because the lower-case() function is used (in the XSLT 2.0 version), we are not constrained to only the Latin alphabet.

Related

Strip prefix from attribute value

For a project, I'm stuck with XSLT-1.0/XPATH-1.0 and need a fast way to strip a lowercase prefix from attribute values.
Example attribute values are:
"cmdValue1", "gfValue2", "dTestCase3"
The values I need are:
"Value1", "Value2", "TestCase3"
I came up with this XPath expression but it is too slow for my application:
substring(#attr, 1 + string-length(substring-before(translate(#attr, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', '..........................'), '.')))
In essence the above does replace all uppercase chars to dots, then creates a substring from the original attribute value starting from the first found dot position (first uppercase char).
Does anyone know a shorter/faster way to do this in XSLT-1.0/XPATH-1.0?
There are not many functions in XSLT 1.0 which we could use instead, so I tried the following recursive template to avoid the use of the translate function.
Because it is 1.5 times slower, it does not answer your question. I can just avoid someone trying the same thing:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xml:space="default" exclude-result-prefixes="" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="no" indent="yes" />
<xsl:template match="/">
<out>
<xsl:call-template name="removePrefix">
<xsl:with-param name="prefixedName" select="xml/#attrib" />
</xsl:call-template>
</out>
</xsl:template>
<xsl:template name="removePrefix">
<xsl:param name="prefixedName" />
<xsl:choose>
<xsl:when test="substring-before('_abcdefghijklmnopqrstuvwxyz', substring($prefixedName, 1,1))">
<xsl:call-template name="removePrefix">
<xsl:with-param name="prefixedName" select="substring($prefixedName,2)" />
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$prefixedName" />
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
You don't need to calculate the prefix's length and manually extract the substring. Instead, just directly ask for everything that comes after it:
substring-after(#attr,
substring-before(translate(#attr,
'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'..........................'),
'.'))
This isn't a huge improvement, but it might shave 7-8% (based on some really rough and quick tests).

parsing string in xslt

I have following xml
<xml>
<xref>
is determined “in prescribed manner”
</xref>
</xml>
I want to see if we can process xslt 2 and return the following result
<xml>
<xref>
is
</xref>
<xref>
determined
</xref>
<xref>
“in prescribed manner”
</xref>
</xml>
I tried few options like replace the space and entities and then using for-each loop but not able to work it out. May be we can use tokenize function of xslt 2.0 but don't know how to use it. Any hint will be helpful.
# JimGarrison: Sorry, I couldn't resist. :-) This XSLT is definitely not elegant but it does (I assume) most of the job:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />
<xsl:variable name="left_quote" select="'<'"/>
<xsl:variable name="right_quote" select="'>'"/>
<xsl:template name="protected_tokenize">
<xsl:param name="string"/>
<xsl:variable name="pattern" select="concat('^([^', $left_quote, ']+)(', $left_quote, '[^', $right_quote, ']*', $right_quote,')?(.*)')"/>
<xsl:analyze-string select="$string" regex="{$pattern}">
<xsl:matching-substring>
<!-- Handle the prefix of the string up to the first opening quote by "normal" tokenizing. -->
<xsl:variable name="prefix" select="concat(' ', normalize-space(regex-group(1)))"/>
<xsl:for-each select="tokenize(normalize-space($prefix), ' ')">
<xref>
<xsl:value-of select="."/>
</xref>
</xsl:for-each>
<!-- Handle the text between the quotes by simply passing it through. -->
<xsl:variable name="protected_token" select="normalize-space(regex-group(2))"/>
<xsl:if test="$protected_token != ''">
<xref>
<xsl:value-of select="$protected_token"/>
</xref>
</xsl:if>
<!-- Handle the suffix of the string. This part may contained protected tokens again. So we do it recursively. -->
<xsl:variable name="suffix" select="normalize-space(regex-group(3))"/>
<xsl:if test="$suffix != ''">
<xsl:call-template name="protected_tokenize">
<xsl:with-param name="string" select="$suffix"/>
</xsl:call-template>
</xsl:if>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:template>
<xsl:template match="*|#*">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="xref">
<xsl:call-template name="protected_tokenize">
<xsl:with-param name="string" select="text()"/>
</xsl:call-template>
</xsl:template>
</xsl:stylesheet>
Notes:
There is the general assumption that white space only serves as a token delimiter and need not be preserved.
“ and rdquo; seem to be invalid in XML although they are valid in HTML. In the XSLT there are variables defined holding the quote characters. They will have to be adapted once you find the right XML representation. You can also eliminate the variables and put the characters right into the regular expression pattern. It will be significantly simplified by this.
<xsl:analyze-string> does not allow a regular expression which may evaluate into an empty string. This comes as a little problem since either the prefix and/or the proteced token and/or the suffix may be empty. I take care of this by artificially adding a space at the beginning of the pattern which allows me to search for the prefix using + (at least one occurence) instead of * (zero or more occurences).

XSLT: Passing URL querystring as a parameter

I know this is an old question that has been passed around SO several times but I was wondering whether anyone can expand on whether a URL that has a querystring attached to it can be stripped out via XSLT 1.0 and can be used as a parameter for later use of the XSLT transformation.
For example, I have a URL of http://www.mydomain.com/mypage.htm?param1=a&param2=b
via XSLT, I am looking for a result of something along the lines of:
<xsl:param name="param1">a</xsl:param> and <xsl:param name="param2">b</xsl:param>
where both parameter name (param1, param2) and it's value (a, b) has been extracted from the quesrystring to allow me to use $param1 and $param2 later on say in an if condition
e.g. <xsl:if test="$param1 = 'a'> comes out true but if we use <xsl:if test="$param1 = 'b'> comes out false.
I have seen a similar question here: Retrieve page URL params or page URL in XSLT which uses the str-split-to-words template but I have unsuccessfully got it working (possibly due to me implementing it the wrong way) so any working examples of how it can be done in practice would be massively beneficial.
XSLT:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns="http://www.w3.org/1999/xhtml" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ext="http://exslt.org/common">
<xsl:import href="http://fxsl.cvs.sourceforge.net/viewvc/fxsl/fxsl-xslt2/f/strSplit-to-Words.xsl"/>
<xsl:output indent="yes" method="html"/>
<xsl:template match="/">
<xsl:variable name="vwordNodes">
<xsl:call-template name="str-split-to-words">
<xsl:with-param name="pStr" select="$pQString"/>
<xsl:with-param name="pDelimiters" select="'?&'"/>
</xsl:call-template>
</xsl:variable>
<xsl:apply-templates select="ext:node-set($vwordNodes)/*"/>
</xsl:template>
<xsl:template match="word">
<xsl:value-of select="."/>
<xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
There are a few problems in your code:
<xsl:import href="http://fxsl.cvs.sourceforge.net/viewvc/fxsl/fxsl-xslt2/f/strSplit-to-Words.xsl"/> I doubt that the wanted stylesheet can be imported directly from its SourceForge view page -- especially taking into account, that it itself imports other FXSL stylesheets. The correct way to use FXSL is to download it to the local computer and reference its stylesheets off the file location it resides in at the local computer.
...
.2. <xsl:with-param name="pStr" select="$pQString"/> This will produce a compilation error because you missed to define the $pQString global/external parameter. You need to define this parameter at global level. It can be given a default value (for example a particular URL) for easier testing. However, the idea of using this parameter is that the invoker of the transformation should pass this parameter to the transformation.
.3. The results of the transformation are written to the output. While this is good for demonstration purposes, you want to be able to use these results later in the transformation. The way to do this is to capture these results in a variable, make another variable from it, with a regular tree (from its RTF type) and then reference the nodes of this last variable.
Here is an example of the code you want (provided that you have downloaded FXSL, unzipped the distribution and saved this code in the same directory as the unzipped distribution of FXSL):
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ext="http://exslt.org/common"
>
<xsl:import href="strSplit-to-Words.xsl"/>
<xsl:output indent="yes" omit-xml-declaration="yes"/>
<xsl:param name="pUrl" select=
"'http://www.mydomain.com/mypage.htm?param1=a&param2=b'"/>
<xsl:param name="pQString" select=
"substring-after($pUrl, '?')"
/>
<xsl:template match="/">
<xsl:variable name="vwordNodes">
<xsl:call-template name="str-split-to-words">
<xsl:with-param name="pStr" select="$pQString"/>
<xsl:with-param name="pDelimiters"
select="'?&'"/>
</xsl:call-template>
</xsl:variable>
<xsl:variable name="vrtfqueryParams">
<xsl:apply-templates select="ext:node-set($vwordNodes)/*"/>
</xsl:variable>
<xsl:variable name="vqueryParams" select="ext:node-set($vrtfqueryParams)/*"/>
<xsl:value-of select="$vqueryParams/#name[. ='param1']"/>
<xsl:text> : </xsl:text>
<xsl:value-of select="$vqueryParams[#name = 'param1']"/>
<xsl:text>
</xsl:text>
<xsl:value-of select="$vqueryParams/#name[. ='param2']"/>
<xsl:text> : </xsl:text>
<xsl:value-of select="$vqueryParams[#name = 'param2']"/>
</xsl:template>
<xsl:template match="word">
<param name="{substring-before(.,'=')}">
<xsl:value-of select="substring-after(.,'=')"/>
</param>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on any XML document (not used in this demo), the wanted, correct result -- the query-string parameters referenced of a results variable by name -- is produced:
param1 : a
param2 : b

Check for CR or LF in XSLT

I have a input XML like this :
<in_xml>
<company>
<project>
ProjNo1
ProjNo2
ProjNo3
</project>
</company>
</in_xml>
A simple XSLT is applied to this source, which writes another XML with the value of Project Tag.
The Project tag in input xml has three lines , it could be one or more line(s). I am looking at way for the XSLT to read only the first line, in case there are more than one and write the first line in the output xml.
The current XSLT is very simple as it just reads the Project tag and spits out the value, hence the code is not attached.
Regards.
I have added the answer to the question, see below #Maestro's answer.
If you are in the happy circumstances of being able to apply XSLT 2.0, the following may help:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="//project/text()">
<xsl:value-of select="tokenize(normalize-space(.),' ')[1]" />
</xsl:template>
</xsl:stylesheet>
Explanation: first normalize-space() to replace all whitespace strings by a single blank (and cut off leading and trailing whitespace), then split into words, then take the first one.
In XSLT 1.0 you could use
<xsl:value-of select="substring-before(normalize-space(.), ' ')"/>
instead. Less flexible if the second word has to be selected, but for the first word it works OK.
EDIT
you asked how to retrieve a first line in XSLT 1.0 - problem here is the leading whitespace which may contain a LF so you cannot just substring-before the first LF.
The below can probably be improved upon, but it works fine:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="//project/text()">
<xsl:variable name="afterLeadingWS"
select="substring-after(., substring-before(.,substring-before(normalize-space(.), ' ')))"/>
<xsl:choose>
<xsl:when test="contains($afterLeadingWS, '
')">
<xsl:value-of select="substring-before($afterLeadingWS, '
')"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$afterLeadingWS"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Explanation: first get first word as before, then determine the whitespace before that first word, then get everything after that leading whitespace, then get the first line, which is the string before a LF character. It may just happen that there is no LF except maybe in the leading whitespace, hence the choose function.
first up thanks to #Maestro for taking time out and helping me, appreciate your help.
Here is the code that I used to get the first line from a paragraph of text that has a CR:
<xsl:variable name="projNumber" select="ProjectNumber" />
<xsl:variable name="crlf" select="'
'" />
<xsl:choose>
<xsl:when test="contains($projNumber,$crlf)">
<xsl:value-of select="substring-before($projNumber,$crlf)"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$projNumber"/>
</xsl:otherwise>
</xsl:choose>
This can be written as a function but I don't know how to do it, maybe someone can guide but there you go. A better approach that a colleague of mine suggested is to escape the CR and directly use it in substring function this would avoid all those variables in the first place.
Thanks again.
Here's the code again, now with a function getFirstLine.
Note the addtional namespace that is needed.
Also note that this does require XSLT 2.0 (xsl:function is not available in 1.0).
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:f="http://temp.com/functions">
<xsl:output method="text"/>
<xsl:template match="//project/text()">
<xsl:value-of select="f:getFirstLine(.)"/>
</xsl:template>
<xsl:function name="f:getFirstLine">
<xsl:param name="input"/>
<xsl:variable name="afterLeadingWS" select="substring-after($input, substring-before($input,substring-before(normalize-space($input), ' ')))"/>
<xsl:choose>
<xsl:when test="contains($afterLeadingWS, '
')">
<xsl:value-of select="substring-before($afterLeadingWS, '
')"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$afterLeadingWS"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
</xsl:stylesheet>

Formatting string (Removing leading zeros)

I am newbie to xslt. My requirement is to transform xml file into text file as per the business specifications. I am facing an issue with one of the string formatting issue. Please help me out if you have any idea.
Here is the part of input xml data:
"0001295"
Expected result to print into text file:
1295
My main issue is to remove leading Zeros. Please share if you have any logic/function.
Just use this simple expression:
number(.)
Here is a complete example:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="t">
<xsl:value-of select="number(.)"/>
</xsl:template>
</xsl:stylesheet>
When applied on this XML document:
<t>0001295</t>
the wanted, correct result is produced:
1295
II. Use format-number()
format-number(., '#')
There are a couple of ways you can do this. If the value is entirely numeric (for example not a CSV line or part of a product code such as ASN0012345) you can convert from a string to a number and back to a string again :
string(number($value)).
Otherwise just replace the 0's at the start :
replace( $value, '^0*', '' )
The '^' is required (standard regexp syntax) or a value of 001201 will be replaced with 121 (all zero's removed).
Hope that helps.
Dave
Here is one way you could do it in XSLT 1.0.
First, find the first non-zero element, by removing all the zero elements currently in the value
<xsl:variable name="first" select="substring(translate(., '0', ''), 1, 1)" />
Then, you can find the substring-before this first character, and then use substring-after to get the non-zero part after this
<xsl:value-of select="substring-after(., substring-before(., $first))" />
Or, to combine the two statements into one
<xsl:value-of select="substring-after(., substring-before(., substring(translate(., '0', ''), 1, 1)))" />
So, given the following input
<a>00012095Kb</a>
Then using the following XSLT
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/a">
<xsl:value-of select="substring-after(., substring-before(., substring(translate(., '0', ''), 1, 1)))" />
</xsl:template>
</xsl:stylesheet>
The following will be output
12095Kb
As a simple alternative in XSLT 2.0 that can be used with numeric or alpha-numeric input, with or without leading zeros, you might try:
replace( $value, '^0*(..*)', '$1' )
This works because ^0* is greedy and (..*) captures the rest of the input after the last leading zero. $1 refers to the captured group.
Note that an input containing only zeros will output 0.
XSLT 2.0
Remove leading zeros from STRING
<xsl:value-of select="replace( $value, '^0+', '')"/>
You could use a recursive template that will remove the leading zeros:
<xsl:template name="remove-leading-zeros">
<xsl:param name="text"/>
<xsl:choose>
<xsl:when test="starts-with($text,'0')">
<xsl:call-template name="remove-leading-zeros">
<xsl:with-param name="text"
select="substring-after($text,'0')"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$text"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
Invoke it like this:
<xsl:call-template name="remove-leading-zeros">
<xsl:with-param name="text" select="/path/to/node/with/leading/zeros"/>
</xsl:call-template>
</xsl:template>
<xsl:value-of select="number(.) * 1"/>
works for me
All XSLT1 parser, like the popular libXML2's module for XSLT, have the registered functions facility... So, we can suppose to use it. Suppose also that the language that call XSLT, is PHP: see this wikibook about registerPHPFunctions.
The build-in PHP function ltrim can be used in
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fn="http://php.net/xsl">
<xsl:output method="xml" encoding="utf-8" indent="yes"/>
<xsl:template match="test">
show <xsl:value-of select="fn:function('ltrim',string(.),'0')" />",
</xsl:template>
</xsl:stylesheet>
Now imagine a little bit more complex problem, to ltrim a string with more than 1 number, ex. hello 002 and 021, bye.
The solution is the same: use registerPHPFunctions, except to change the build-in function to a user defined one,
function ltrim0_Multi($s) {
return preg_replace('/(^0+|(?<= )0+)(?=[1-9])/','',$s);
}
converts the example into hello 2 and 21, bye.