Strip prefix from attribute value - xslt

For a project, I'm stuck with XSLT-1.0/XPATH-1.0 and need a fast way to strip a lowercase prefix from attribute values.
Example attribute values are:
"cmdValue1", "gfValue2", "dTestCase3"
The values I need are:
"Value1", "Value2", "TestCase3"
I came up with this XPath expression but it is too slow for my application:
substring(#attr, 1 + string-length(substring-before(translate(#attr, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', '..........................'), '.')))
In essence the above does replace all uppercase chars to dots, then creates a substring from the original attribute value starting from the first found dot position (first uppercase char).
Does anyone know a shorter/faster way to do this in XSLT-1.0/XPATH-1.0?

There are not many functions in XSLT 1.0 which we could use instead, so I tried the following recursive template to avoid the use of the translate function.
Because it is 1.5 times slower, it does not answer your question. I can just avoid someone trying the same thing:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xml:space="default" exclude-result-prefixes="" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="no" indent="yes" />
<xsl:template match="/">
<out>
<xsl:call-template name="removePrefix">
<xsl:with-param name="prefixedName" select="xml/#attrib" />
</xsl:call-template>
</out>
</xsl:template>
<xsl:template name="removePrefix">
<xsl:param name="prefixedName" />
<xsl:choose>
<xsl:when test="substring-before('_abcdefghijklmnopqrstuvwxyz', substring($prefixedName, 1,1))">
<xsl:call-template name="removePrefix">
<xsl:with-param name="prefixedName" select="substring($prefixedName,2)" />
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$prefixedName" />
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>

You don't need to calculate the prefix's length and manually extract the substring. Instead, just directly ask for everything that comes after it:
substring-after(#attr,
substring-before(translate(#attr,
'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'..........................'),
'.'))
This isn't a huge improvement, but it might shave 7-8% (based on some really rough and quick tests).

Related

Extract parts of inputs strings

I am trying to extract equipment names from strings and would like if someone could help me find a good way to do this.
My input string can either contain 1 or 2 equipment names, consisting of EQ followed 1 to 3 digits, for example :
LocationEQ3Suffix
LocationEQ5EQ8Suffix
So in the first instance I would need 'EQ3' and in the second instance I would need 'EQ5' and 'EQ8'.
I need the output to be in a text format, for example :
SomeText.EQ3
SomeText.EQ5
SomeText.EQ8
I was thinking there might be a way to do this with xsl:analyze-string and a regex like EQ[0-9]{1,3}.
Any help is appreciated.
I started something like this, but I don't think it's the right approach.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:variable name="input" select="'LocationEQ3EQ4Funct'"/>
<xsl:choose>
<!-- Case with 2 EQ -->
<xsl:when test="matches($input, 'EQ[0-9]{1,3}EQ[0-9]{1,3}')">
<xsl:value-of select="$input"/>
</xsl:when>
<!-- Case with 1 EQ -->
<xsl:otherwise>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
You say you want to use xsl:analyze-string but you're not.
An implementation using it would look something like:
<xsl:analyze-string select="input-string" regex="EQ\d{{1,3}}">
<xsl:matching-substring>
<xsl:text>SomeText.</xsl:text>
<xsl:value-of select="." />
<xsl:text>
</xsl:text>
</xsl:matching-substring>
</xsl:analyze-string>
Demo: https://xsltfiddle.liberty-development.net/a9Hk1a

parsing string in xslt

I have following xml
<xml>
<xref>
is determined “in prescribed manner”
</xref>
</xml>
I want to see if we can process xslt 2 and return the following result
<xml>
<xref>
is
</xref>
<xref>
determined
</xref>
<xref>
“in prescribed manner”
</xref>
</xml>
I tried few options like replace the space and entities and then using for-each loop but not able to work it out. May be we can use tokenize function of xslt 2.0 but don't know how to use it. Any hint will be helpful.
# JimGarrison: Sorry, I couldn't resist. :-) This XSLT is definitely not elegant but it does (I assume) most of the job:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />
<xsl:variable name="left_quote" select="'<'"/>
<xsl:variable name="right_quote" select="'>'"/>
<xsl:template name="protected_tokenize">
<xsl:param name="string"/>
<xsl:variable name="pattern" select="concat('^([^', $left_quote, ']+)(', $left_quote, '[^', $right_quote, ']*', $right_quote,')?(.*)')"/>
<xsl:analyze-string select="$string" regex="{$pattern}">
<xsl:matching-substring>
<!-- Handle the prefix of the string up to the first opening quote by "normal" tokenizing. -->
<xsl:variable name="prefix" select="concat(' ', normalize-space(regex-group(1)))"/>
<xsl:for-each select="tokenize(normalize-space($prefix), ' ')">
<xref>
<xsl:value-of select="."/>
</xref>
</xsl:for-each>
<!-- Handle the text between the quotes by simply passing it through. -->
<xsl:variable name="protected_token" select="normalize-space(regex-group(2))"/>
<xsl:if test="$protected_token != ''">
<xref>
<xsl:value-of select="$protected_token"/>
</xref>
</xsl:if>
<!-- Handle the suffix of the string. This part may contained protected tokens again. So we do it recursively. -->
<xsl:variable name="suffix" select="normalize-space(regex-group(3))"/>
<xsl:if test="$suffix != ''">
<xsl:call-template name="protected_tokenize">
<xsl:with-param name="string" select="$suffix"/>
</xsl:call-template>
</xsl:if>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:template>
<xsl:template match="*|#*">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="xref">
<xsl:call-template name="protected_tokenize">
<xsl:with-param name="string" select="text()"/>
</xsl:call-template>
</xsl:template>
</xsl:stylesheet>
Notes:
There is the general assumption that white space only serves as a token delimiter and need not be preserved.
“ and rdquo; seem to be invalid in XML although they are valid in HTML. In the XSLT there are variables defined holding the quote characters. They will have to be adapted once you find the right XML representation. You can also eliminate the variables and put the characters right into the regular expression pattern. It will be significantly simplified by this.
<xsl:analyze-string> does not allow a regular expression which may evaluate into an empty string. This comes as a little problem since either the prefix and/or the proteced token and/or the suffix may be empty. I take care of this by artificially adding a space at the beginning of the pattern which allows me to search for the prefix using + (at least one occurence) instead of * (zero or more occurences).

Check for CR or LF in XSLT

I have a input XML like this :
<in_xml>
<company>
<project>
ProjNo1
ProjNo2
ProjNo3
</project>
</company>
</in_xml>
A simple XSLT is applied to this source, which writes another XML with the value of Project Tag.
The Project tag in input xml has three lines , it could be one or more line(s). I am looking at way for the XSLT to read only the first line, in case there are more than one and write the first line in the output xml.
The current XSLT is very simple as it just reads the Project tag and spits out the value, hence the code is not attached.
Regards.
I have added the answer to the question, see below #Maestro's answer.
If you are in the happy circumstances of being able to apply XSLT 2.0, the following may help:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="//project/text()">
<xsl:value-of select="tokenize(normalize-space(.),' ')[1]" />
</xsl:template>
</xsl:stylesheet>
Explanation: first normalize-space() to replace all whitespace strings by a single blank (and cut off leading and trailing whitespace), then split into words, then take the first one.
In XSLT 1.0 you could use
<xsl:value-of select="substring-before(normalize-space(.), ' ')"/>
instead. Less flexible if the second word has to be selected, but for the first word it works OK.
EDIT
you asked how to retrieve a first line in XSLT 1.0 - problem here is the leading whitespace which may contain a LF so you cannot just substring-before the first LF.
The below can probably be improved upon, but it works fine:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="//project/text()">
<xsl:variable name="afterLeadingWS"
select="substring-after(., substring-before(.,substring-before(normalize-space(.), ' ')))"/>
<xsl:choose>
<xsl:when test="contains($afterLeadingWS, '
')">
<xsl:value-of select="substring-before($afterLeadingWS, '
')"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$afterLeadingWS"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Explanation: first get first word as before, then determine the whitespace before that first word, then get everything after that leading whitespace, then get the first line, which is the string before a LF character. It may just happen that there is no LF except maybe in the leading whitespace, hence the choose function.
first up thanks to #Maestro for taking time out and helping me, appreciate your help.
Here is the code that I used to get the first line from a paragraph of text that has a CR:
<xsl:variable name="projNumber" select="ProjectNumber" />
<xsl:variable name="crlf" select="'
'" />
<xsl:choose>
<xsl:when test="contains($projNumber,$crlf)">
<xsl:value-of select="substring-before($projNumber,$crlf)"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$projNumber"/>
</xsl:otherwise>
</xsl:choose>
This can be written as a function but I don't know how to do it, maybe someone can guide but there you go. A better approach that a colleague of mine suggested is to escape the CR and directly use it in substring function this would avoid all those variables in the first place.
Thanks again.
Here's the code again, now with a function getFirstLine.
Note the addtional namespace that is needed.
Also note that this does require XSLT 2.0 (xsl:function is not available in 1.0).
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:f="http://temp.com/functions">
<xsl:output method="text"/>
<xsl:template match="//project/text()">
<xsl:value-of select="f:getFirstLine(.)"/>
</xsl:template>
<xsl:function name="f:getFirstLine">
<xsl:param name="input"/>
<xsl:variable name="afterLeadingWS" select="substring-after($input, substring-before($input,substring-before(normalize-space($input), ' ')))"/>
<xsl:choose>
<xsl:when test="contains($afterLeadingWS, '
')">
<xsl:value-of select="substring-before($afterLeadingWS, '
')"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$afterLeadingWS"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
</xsl:stylesheet>

XSLT 1.0: Is there a way to get the value of a param based on another param in xslt?

I have the following params defined in my xslt file:
<xsl:param name="language">E</xsl:param>
<xsl:param name="headerTitle-E">English Title</xsl:param>
<xsl:param name="headerTitle-F">French Title</xsl:param>
How do I display the appropriate header based on the language param?
This doesn't work:
<xsl:value-of select="concat('headerTitle','-',$language)" />
It outputs "headerTitle-E" as opposed to "English Title" (which is what I want).
I'm trying to find a clean solution for displaying the appropriate text based on the language parameter, without having to use a "choose" block for every bit of text.
Any ideas?
If you now where your parameters are, you can use a single XPath. For instance, try this:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:param name="language">F</xsl:param>
<xsl:param name="headerTitle-E">English Title</xsl:param>
<xsl:param name="headerTitle-F">French Title</xsl:param>
<xsl:template match="/">
<xsl:value-of select="document('')/*/
xsl:param[#name=concat('headerTitle-',$language)]"/>
</xsl:template>
</xsl:stylesheet>
However I think that this kind of task should be better accomplished making use of lookup tables than parameters.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:empo="lookup"
exclude-result-prefixes="empo"
version="1.0">
<xsl:param name="language">F</xsl:param>
<empo:header name="headerTitle-E">English Title</empo:header>
<empo:header name="headerTitle-F">French Title</empo:header>
<xsl:template match="/">
<xsl:value-of select="document('')/*/
empo:header[#name=concat('headerTitle-',$language)]"/>
</xsl:template>
</xsl:stylesheet>
You might also want use the current header as a variable, just use:
<xsl:variable name="Header" select="document('')/*/
empo:header[#name=concat('headerTitle-',$language)]"/>
You can use the full broadth of XSLT inside xsl:param and xsl:variable. So do it like this:
<xsl:variable name="headerTitle">
<xsl:choose>
<xsl:when test="$language = 'fr'">
French
</xsl:when>
<xsl:otherwise>
English
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<xsl:value-of select="$headerTitle" />
Actually, the choose block is the clean solution, compared to creating dozens of unneeded variables.

Complex XSLT split?

Is it possible to split a tag at lower to upper case boundaries i.e.
for example, tag 'UserLicenseCode' should be converted to 'User License Code'
so that the column headers look a little nicer.
I've done something like this in the past using Perl's regular expressions,
but XSLT is a whole new ball game for me.
Any pointers in creating such a template would be greatly appreciated!
Thanks
Krishna
Using recursion, it is possible to walk through a string in XSLT to evaluate every character. To do this, create a new template which accepts only one string parameter. Check the first character and if it's an uppercase character, write a space. Then write the character. Then call the template again with the remaining characters inside a single string. This would result in what you want to do.
That would be your pointer. I will need some time to work out the template. :-)
It took some testing, especially to get the space inside the whole thing. (I misused a character for this!) But this code should give you an idea...
I used this XML:
<?xml version="1.0" encoding="UTF-8"?>
<blah>UserLicenseCode</blah>
and then this stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:output method="text"/>
<xsl:variable name="Space">*</xsl:variable>
<xsl:template match="blah">
<xsl:variable name="Split">
<xsl:call-template name="Split">
<xsl:with-param name="Value" select="."/>
<xsl:with-param name="First" select="true()"/>
</xsl:call-template></xsl:variable>
<xsl:value-of select="translate($Split, '*', ' ')" />
</xsl:template>
<xsl:template name="Split">
<xsl:param name="Value"/>
<xsl:param name="First" select="false()"/>
<xsl:if test="$Value!=''">
<xsl:variable name="FirstChar" select="substring($Value, 1, 1)"/>
<xsl:variable name="Rest" select="substring-after($Value, $FirstChar)"/>
<xsl:if test="not($First)">
<xsl:if test="translate($FirstChar, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', '..........................')= '.'">
<xsl:value-of select="$Space"/>
</xsl:if>
</xsl:if>
<xsl:value-of select="$FirstChar"/>
<xsl:call-template name="Split">
<xsl:with-param name="Value" select="$Rest"/>
</xsl:call-template>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
and I got this as result:
User License Code
Do keep in mind that spaces and other white-space characters do tend to be stripped away from XML, which is why I used an '*' instead, which I translated to a space.
Of course, this code could be improved. It's what I could come up with in 10 minutes of work. In other languages, it would take less lines of code but in XSLT it's still quite fast, considering the amount of code lines it contains.
An XSLT + FXSL solution (in XSLT 2.0, but almost the same code will work with XSLT 1.0 and FXSL 1.x:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:f="http://fxsl.sf.net/"
xmlns:testmap="testmap"
exclude-result-prefixes="f testmap"
>
<xsl:import href="../f/func-str-dvc-map.xsl"/>
<testmap:testmap/>
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="vTestMap" select="document('')/*/testmap:*[1]"/>
'<xsl:value-of select="f:str-map($vTestMap, 'UserLicenseCode')"
/>'
</xsl:template>
<xsl:template name="mySplit" match="*[namespace-uri() = 'testmap']"
mode="f:FXSL">
<xsl:param name="arg1"/>
<xsl:value-of select=
"if(lower-case($arg1) ne $arg1)
then concat(' ', $arg1)
else $arg1
"/>
</xsl:template>
</xsl:stylesheet>
When the above transformation is applied on any source XML document (not used), the expected correct result is produced:
' User License Code'
Do note:
We are using the DVC version of the FXSL function/template str-map(). This is a Higher-order function (HOF) which takes two arguments: another function and a string. str-map() applies the function on every character of the string and returns the concatenation of the results.
Because the lower-case() function is used (in the XSLT 2.0 version), we are not constrained to only the Latin alphabet.