Extract parts of inputs strings - xslt

I am trying to extract equipment names from strings and would like if someone could help me find a good way to do this.
My input string can either contain 1 or 2 equipment names, consisting of EQ followed 1 to 3 digits, for example :
LocationEQ3Suffix
LocationEQ5EQ8Suffix
So in the first instance I would need 'EQ3' and in the second instance I would need 'EQ5' and 'EQ8'.
I need the output to be in a text format, for example :
SomeText.EQ3
SomeText.EQ5
SomeText.EQ8
I was thinking there might be a way to do this with xsl:analyze-string and a regex like EQ[0-9]{1,3}.
Any help is appreciated.
I started something like this, but I don't think it's the right approach.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:variable name="input" select="'LocationEQ3EQ4Funct'"/>
<xsl:choose>
<!-- Case with 2 EQ -->
<xsl:when test="matches($input, 'EQ[0-9]{1,3}EQ[0-9]{1,3}')">
<xsl:value-of select="$input"/>
</xsl:when>
<!-- Case with 1 EQ -->
<xsl:otherwise>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>

You say you want to use xsl:analyze-string but you're not.
An implementation using it would look something like:
<xsl:analyze-string select="input-string" regex="EQ\d{{1,3}}">
<xsl:matching-substring>
<xsl:text>SomeText.</xsl:text>
<xsl:value-of select="." />
<xsl:text>
</xsl:text>
</xsl:matching-substring>
</xsl:analyze-string>
Demo: https://xsltfiddle.liberty-development.net/a9Hk1a

Related

Strip prefix from attribute value

For a project, I'm stuck with XSLT-1.0/XPATH-1.0 and need a fast way to strip a lowercase prefix from attribute values.
Example attribute values are:
"cmdValue1", "gfValue2", "dTestCase3"
The values I need are:
"Value1", "Value2", "TestCase3"
I came up with this XPath expression but it is too slow for my application:
substring(#attr, 1 + string-length(substring-before(translate(#attr, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', '..........................'), '.')))
In essence the above does replace all uppercase chars to dots, then creates a substring from the original attribute value starting from the first found dot position (first uppercase char).
Does anyone know a shorter/faster way to do this in XSLT-1.0/XPATH-1.0?
There are not many functions in XSLT 1.0 which we could use instead, so I tried the following recursive template to avoid the use of the translate function.
Because it is 1.5 times slower, it does not answer your question. I can just avoid someone trying the same thing:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xml:space="default" exclude-result-prefixes="" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="no" indent="yes" />
<xsl:template match="/">
<out>
<xsl:call-template name="removePrefix">
<xsl:with-param name="prefixedName" select="xml/#attrib" />
</xsl:call-template>
</out>
</xsl:template>
<xsl:template name="removePrefix">
<xsl:param name="prefixedName" />
<xsl:choose>
<xsl:when test="substring-before('_abcdefghijklmnopqrstuvwxyz', substring($prefixedName, 1,1))">
<xsl:call-template name="removePrefix">
<xsl:with-param name="prefixedName" select="substring($prefixedName,2)" />
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$prefixedName" />
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
You don't need to calculate the prefix's length and manually extract the substring. Instead, just directly ask for everything that comes after it:
substring-after(#attr,
substring-before(translate(#attr,
'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'..........................'),
'.'))
This isn't a huge improvement, but it might shave 7-8% (based on some really rough and quick tests).

parsing string in xslt

I have following xml
<xml>
<xref>
is determined “in prescribed manner”
</xref>
</xml>
I want to see if we can process xslt 2 and return the following result
<xml>
<xref>
is
</xref>
<xref>
determined
</xref>
<xref>
“in prescribed manner”
</xref>
</xml>
I tried few options like replace the space and entities and then using for-each loop but not able to work it out. May be we can use tokenize function of xslt 2.0 but don't know how to use it. Any hint will be helpful.
# JimGarrison: Sorry, I couldn't resist. :-) This XSLT is definitely not elegant but it does (I assume) most of the job:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />
<xsl:variable name="left_quote" select="'<'"/>
<xsl:variable name="right_quote" select="'>'"/>
<xsl:template name="protected_tokenize">
<xsl:param name="string"/>
<xsl:variable name="pattern" select="concat('^([^', $left_quote, ']+)(', $left_quote, '[^', $right_quote, ']*', $right_quote,')?(.*)')"/>
<xsl:analyze-string select="$string" regex="{$pattern}">
<xsl:matching-substring>
<!-- Handle the prefix of the string up to the first opening quote by "normal" tokenizing. -->
<xsl:variable name="prefix" select="concat(' ', normalize-space(regex-group(1)))"/>
<xsl:for-each select="tokenize(normalize-space($prefix), ' ')">
<xref>
<xsl:value-of select="."/>
</xref>
</xsl:for-each>
<!-- Handle the text between the quotes by simply passing it through. -->
<xsl:variable name="protected_token" select="normalize-space(regex-group(2))"/>
<xsl:if test="$protected_token != ''">
<xref>
<xsl:value-of select="$protected_token"/>
</xref>
</xsl:if>
<!-- Handle the suffix of the string. This part may contained protected tokens again. So we do it recursively. -->
<xsl:variable name="suffix" select="normalize-space(regex-group(3))"/>
<xsl:if test="$suffix != ''">
<xsl:call-template name="protected_tokenize">
<xsl:with-param name="string" select="$suffix"/>
</xsl:call-template>
</xsl:if>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:template>
<xsl:template match="*|#*">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="xref">
<xsl:call-template name="protected_tokenize">
<xsl:with-param name="string" select="text()"/>
</xsl:call-template>
</xsl:template>
</xsl:stylesheet>
Notes:
There is the general assumption that white space only serves as a token delimiter and need not be preserved.
“ and rdquo; seem to be invalid in XML although they are valid in HTML. In the XSLT there are variables defined holding the quote characters. They will have to be adapted once you find the right XML representation. You can also eliminate the variables and put the characters right into the regular expression pattern. It will be significantly simplified by this.
<xsl:analyze-string> does not allow a regular expression which may evaluate into an empty string. This comes as a little problem since either the prefix and/or the proteced token and/or the suffix may be empty. I take care of this by artificially adding a space at the beginning of the pattern which allows me to search for the prefix using + (at least one occurence) instead of * (zero or more occurences).

convert character if codepoint within given range

I have a couple of XML files that contain unicode characters with codepoint values between 57600 and 58607. Currently these are shown in my content as square blocks and I'd like to convert these to elements.
So what I'd like to achieve is something like :
<!-- current input -->
<p> Follow the on-screen instructions.</p>
<!-- desired output-->
<p><unichar value="58208"/> Follow the on-screen instructions.</p>
<!-- Where 58208 is the actual codepoint of the unicode character in question -->
I've fooled around a bit with tokenizer but as you need to have reference to split upon this turned out to be over complicated.
Any advice on how to tackle this best ? I've been trying some things like below but got struck (don't mind the syntax, I know it doesn't make any sense)
<xsl:template match="text()">
-> for every character in my string
-> if string-to-codepoints(current character) greater then 57600 return <unichar value="codepoint value"/>
else return character
</xsl:template>
It sounds like a job for analyze-string e.g.
<xsl:template match="text()">
<xsl:analyze-string select="." regex="[-]">
<xsl:matching-substring>
<unichar value="{string-to-codepoints(.)}"/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
Untested.
This transformation:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes"/>
<xsl:template match="/*">
<p>
<xsl:for-each select="string-to-codepoints(.)">
<xsl:choose>
<xsl:when test=". > 57600">
<unichar value="{.}"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="codepoints-to-string(.)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</p>
</xsl:template>
</xsl:stylesheet>
when applied on the provided XML document:
<p> Follow the on-screen instructions.</p>
produces the wanted, correct result:
<p><unichar value="58498"/> Follow the on-screen instructions.</p>
Explanation: Proper use of the standard XPath 2.0 functions string-to-codepoints() and codepoints-to-string().

XSLT tokenize - capturing the separators

here is a piece of code in XSL which tokenizes a text into fragments separated by interpunction and similar characters. I'd like to ask if there is a possibility to somehow capture the strings by which the text was tokenized, for example the comma or dot etc.
<xsl:stylesheet version="2.0" exclude-result-prefixes="xs xdt err fn" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:err="http://www.w3.org/2005/xqt-errors" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="GENERUJ">
<TEXT>
<xsl:variable name="text">
<xsl:value-of select="normalize-space(unparsed-text(#filename, 'UTF-8'))" disable-output-escaping="yes"/>
</xsl:variable>
<xsl:for-each select="tokenize($text, '(\s+("|\(|\[|\{))|(("|,|;|:|\s\-|\)|\]|\})\s+)|((\.|\?|!|;)"?\s*)' )">
<xsl:choose>
<xsl:when test="string-length(.)>0">
<FRAGMENT>
<CONTENT>
<xsl:value-of select="."/>
</CONTENT>
<LENGTH>
<xsl:value-of select="string-length(.)"/>
</LENGTH>
</FRAGMENT>
</xsl:when>
<xsl:otherwise>
<FRAGMENT_COUNT>
<xsl:value-of select="last()-1"/>
</FRAGMENT_COUNT>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</TEXT>
</xsl:template>
As you see the constructed tags CONTENTS, LENGTH, I'd like to add one called SEPARATOR if you know what I mean. I couldnt find any answer to this on the internet and I'm just a beginner with xsl transformations so I'm looking for a quick solution. Thank you in advance.
The tokenize() function doesn't allow you to discover what the separators were. If you need to know, you will need to use xsl:analyze-string instead. If you use the same regex as for tokenize(), this passes the "tokens" to the xsl:non-matching-substring instruction and the "separators" to the xsl:matching-substring instruction.

How to declare variable as a link in XSLT

Hi is there away on how to declare a link(ie:http://www.google.com) as a variable and then using the variable for an else if?Something like this?
<xsl:element name="a">
<xsl:attribute name="href">http://www.google.com</xsl:attribute>// first get the link
<xsl:choose>
<xsl:when test="http://www.google.com">
Do something 1
</xsl:when>
<xsl:otherwise>
Do something 2
</xsl:choose>
</xsl:element>
Is this possible?What should i be looking at?
is there away on how to declare a
link(ie:http://www.google.com) as a
variable and then using the variable
for an else if?
Use this code as a working example -- of course you need to learn at least the basics of XSLT:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vLink" select="'http://www.google.com'"/>
<xsl:template match="/">
<xsl:choose>
<xsl:when test="$vLink = 'http://www.google.com'">
It is the Google link...
</xsl:when>
<xsl:otherwise>
It is not (exactly) the Google link...
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on any XML document (not used), the wanted result is produced:
It is the Google link...
One can also use a global <xsl:param>. This can be set externally by the invoker of the transformation.
Match against the content straight forward, and declare the URL as a variable.
If you need it more globally try this:
...
<xsl:apply-templates select="a" />
...
<xsl:template match="a">
Just a link
</xsl:template>
<xsl:template match="a[starts-with(#href, 'http://google.com/') or starts-with(#href, 'http://www.google.com/')]">
Link to google.com
</xsl:template>
It's possible to some extent, but there is no if-else construct in XSL. Here's a version I tested that you might be able to adapt to your needs. The input I used was:
<?xml-stylesheet type="text/xsl" href="test.xsl"?>
<xml>
<LinkValue>http://www.google.com/</LinkValue>
</xml>
The XSL that showing "Do something 1" if LinkValue was the string above or "Do something 2" if I modified it was:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:variable name="LinkValue" select="//LinkValue"/>
<xsl:element name="a">
<xsl:attribute name="href"><xsl:value-of select="$LinkValue"/></xsl:attribute>
<xsl:if test="$LinkValue = 'http://www.google.com/'">
Do something 1
</xsl:if>
<xsl:if test="$LinkValue != 'http://www.google.com/'">
Do something 2
</xsl:if>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Hopefully you can use these samples to figure out exactly what you need to implement for your scenario.