How can I replace certain characters with their escaped variants using XSLT? - xslt

I'm trying to develop an XSLT stylesheet which transforms a given DocBook document to a file which can be fed to the lout document formatting system (which then generates PostScript output).
Doing so requires that I replace a few characters in the text of DocBook elements because they have a special meaning to lout. In particular, the characters
/ | & { } # # ~ \ "
need to be enclosed in double quotes (") so that lout treats them as ordinary characters.
For instance, a DocBook element like
<para>This is a sample {a contrived one at that} ~ it serves no special purpose.</para>
should be transformed to
#PP
This is a sample "{"a contrived one at that"}" "~" it serves no special purpose.
How can I do this with XSLT? I'm using xsltproc, so using XPath 2.0 functions is not an option but a number of EXSLT functions are available.
I tried using a recursive template which yields the substring up to a special character (e.g. {), then the escaped character sequence ("{") and then calls itself on the substring after the special character. However, I have a hard time making this work properly when trying to replace multiple characters, and one of them is used in the escaped sequence itself.

In particular, the characters
/ | & { } # # ~ \ "
need to be enclosed in double quotes
(") so that lout treats them as
ordinary characters.
I. This is most easily accomplished using the str-map template of FXSL:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:f="http://fxsl.sf.net/"
xmlns:strmap="strmap"
exclude-result-prefixes="xsl f strmap">
<xsl:import href="str-dvc-map.xsl"/>
<xsl:output method="text"/>
<strmap:strmap/>
<xsl:template match="/">
<xsl:variable name="vMapFun" select="document('')/*/strmap:*[1]"/>
#PP
<xsl:call-template name="str-map">
<xsl:with-param name="pFun" select="$vMapFun"/>
<xsl:with-param name="pStr" select="."/>
</xsl:call-template>
</xsl:template>
<xsl:template name="escape" match="strmap:*" mode="f:FXSL">
<xsl:param name="arg1"/>
<xsl:variable name="vspecChars">/|&{}##~\"</xsl:variable>
<xsl:variable name="vEscaping" select=
"substring('"', 1 div contains($vspecChars, $arg1))
"/>
<xsl:value-of select=
"concat($vEscaping, $arg1, $vEscaping)"/>
</xsl:template>
</xsl:stylesheet>
when this transformation is aplied on the provided XML document:
<para>This is a sample {a contrived one at that} ~ it serves no special purpose.</para>
the wanted, correct result is produced:
#PP
This is a sample "{"a contrived one at that"}" "~" it serves no special purpose.
II. With XSLT 1.0 recursive named template:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
#PP
<xsl:call-template name="escape">
<xsl:with-param name="pStr" select="."/>
</xsl:call-template>
</xsl:template>
<xsl:template name="escape">
<xsl:param name="pStr" select="."/>
<xsl:param name="pspecChars">/|&{}##~\"</xsl:param>
<xsl:if test="string-length($pStr)">
<xsl:variable name="vchar1" select="substring($pStr,1,1)"/>
<xsl:variable name="vEscaping" select=
"substring('"', 1 div contains($pspecChars, $vchar1))
"/>
<xsl:value-of select=
"concat($vEscaping, $vchar1, $vEscaping)"/>
<xsl:call-template name="escape">
<xsl:with-param name="pStr" select="substring($pStr,2)"/>
<xsl:with-param name="pspecChars" select="$pspecChars"/>
</xsl:call-template>
</xsl:if>
</xsl:template>
</xsl:stylesheet>

Related

XSLT Version 1 replace single/double quotes

I am trying to convert replace single/double quotes with " and &apos; respectively in xml
I am very new to xsl so very much appreciate if someone can help
For a dynamic method of replacing it will be better to create separate template with parameters as input text, what to replace and replace with.
So, in example input text is:
Your text "contains" some "strange" characters and parts.
In below XSL example, you can see replacing of " (") with " and &apos; ("'):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<!--template to replace-->
<xsl:template name="template-replace">
<xsl:param name="param.str"/>
<xsl:param name="param.to.replace"/>
<xsl:param name="param.replace.with"/>
<xsl:choose>
<xsl:when test="contains($param.str,$param.to.replace)">
<xsl:value-of select="substring-before($param.str, $param.to.replace)"/>
<xsl:value-of select="$param.replace.with"/>
<xsl:call-template name="template-replace">
<xsl:with-param name="param.str" select="substring-after($param.str, $param.to.replace)"/>
<xsl:with-param name="param.to.replace" select="$param.to.replace"/>
<xsl:with-param name="param.replace.with" select="$param.replace.with"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$param.str"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="/">
<xsl:call-template name="template-replace">
<!--put your text with quotes-->
<xsl:with-param name="param.str">Your text "contains" some "strange" characters and parts.</xsl:with-param>
<!--put quote to replace-->
<xsl:with-param name="param.to.replace">"</xsl:with-param>
<!--put quot and apos to replace with-->
<xsl:with-param name="param.replace.with">"'</xsl:with-param>
</xsl:call-template>
</xsl:template>
</xsl:stylesheet>
Then result of replacing will be as below:
Your text "'contains"' some "'strange"' characters and parts.
Hope it will help.

parsing string in xslt

I have following xml
<xml>
<xref>
is determined “in prescribed manner”
</xref>
</xml>
I want to see if we can process xslt 2 and return the following result
<xml>
<xref>
is
</xref>
<xref>
determined
</xref>
<xref>
“in prescribed manner”
</xref>
</xml>
I tried few options like replace the space and entities and then using for-each loop but not able to work it out. May be we can use tokenize function of xslt 2.0 but don't know how to use it. Any hint will be helpful.
# JimGarrison: Sorry, I couldn't resist. :-) This XSLT is definitely not elegant but it does (I assume) most of the job:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />
<xsl:variable name="left_quote" select="'<'"/>
<xsl:variable name="right_quote" select="'>'"/>
<xsl:template name="protected_tokenize">
<xsl:param name="string"/>
<xsl:variable name="pattern" select="concat('^([^', $left_quote, ']+)(', $left_quote, '[^', $right_quote, ']*', $right_quote,')?(.*)')"/>
<xsl:analyze-string select="$string" regex="{$pattern}">
<xsl:matching-substring>
<!-- Handle the prefix of the string up to the first opening quote by "normal" tokenizing. -->
<xsl:variable name="prefix" select="concat(' ', normalize-space(regex-group(1)))"/>
<xsl:for-each select="tokenize(normalize-space($prefix), ' ')">
<xref>
<xsl:value-of select="."/>
</xref>
</xsl:for-each>
<!-- Handle the text between the quotes by simply passing it through. -->
<xsl:variable name="protected_token" select="normalize-space(regex-group(2))"/>
<xsl:if test="$protected_token != ''">
<xref>
<xsl:value-of select="$protected_token"/>
</xref>
</xsl:if>
<!-- Handle the suffix of the string. This part may contained protected tokens again. So we do it recursively. -->
<xsl:variable name="suffix" select="normalize-space(regex-group(3))"/>
<xsl:if test="$suffix != ''">
<xsl:call-template name="protected_tokenize">
<xsl:with-param name="string" select="$suffix"/>
</xsl:call-template>
</xsl:if>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:template>
<xsl:template match="*|#*">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="xref">
<xsl:call-template name="protected_tokenize">
<xsl:with-param name="string" select="text()"/>
</xsl:call-template>
</xsl:template>
</xsl:stylesheet>
Notes:
There is the general assumption that white space only serves as a token delimiter and need not be preserved.
“ and rdquo; seem to be invalid in XML although they are valid in HTML. In the XSLT there are variables defined holding the quote characters. They will have to be adapted once you find the right XML representation. You can also eliminate the variables and put the characters right into the regular expression pattern. It will be significantly simplified by this.
<xsl:analyze-string> does not allow a regular expression which may evaluate into an empty string. This comes as a little problem since either the prefix and/or the proteced token and/or the suffix may be empty. I take care of this by artificially adding a space at the beginning of the pattern which allows me to search for the prefix using + (at least one occurence) instead of * (zero or more occurences).

Transform an int to a char

I'd like to write the alphabet with a link for each letter. So I used templates but I don't how to make this letter I tried that but I had a normal mistake : (A decimal representation must imediately follow the &# in a character reference).
<xsl:template name="alphabet">
<xsl:param name="iLetter"/>
<xsl:if test="$iLetter < 91">
<a><xsl:attribute name="href">req.html?X_letter=&#<xsl:value-of select="$iLetter"/>;</xsl:attribute>&#<xsl:value-of select="$iLetter"/>;</xsl:attribute></a>
<xsl:call-template name="alphabet">
<xsl:with-param name="iLetter" select="number($iLetter)+1"/>
</xsl:call-template>
</xsl:if>
</xsl:template>
And I call this template ilke that:
<xsl:call-template name="alphabet">
<xsl:with-param name="iLetter" select="number(65)"/>
</xsl:call-template>
So, I'd like to obtain this result:
A B C D ..... X Y Z without ... of course :)
The currently accepted answer is incorrect, because it doesn't produce correctly the text child of any a element.
Here is a correct XSLT 1.0 solution:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vAlpha" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
<xsl:template match="/">
<xsl:call-template name="alphabet"/>
</xsl:template>
<xsl:template name="alphabet">
<xsl:param name="pCode" select="65"/>
<xsl:if test="not($pCode > 90)">
<xsl:variable name="vChar" select=
"substring($vAlpha, $pCode - 64, 1)"/>
<a href="req.html?X_letter={$vChar}">
<xsl:value-of select="$vChar"/>
</a>
<xsl:call-template name="alphabet">
<xsl:with-param name="pCode" select="$pCode+1"/>
</xsl:call-template>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
when applied on any XML document (not used), the wanted, correct result is produced:
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
II. XSLT 2.0 solution:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:my="my:my" exclude-result-prefixes="xs my"
xmlns="http://www.w3.org/1999/xhtml">
<xsl:output omit-xml-declaration="yes" method="xhtml" indent="yes"/>
<xsl:param name="pStart" as="xs:integer" select="65"/>
<xsl:param name="pEnd" as="xs:integer" select="90"/>
<xsl:variable name="vCodes" as="xs:integer*" select=
"for $i in $pStart to $pEnd
return $i
"/>
<xsl:template match="/">
<html>
<xsl:sequence select="my:alphabet()"/>
</html>
</xsl:template>
<xsl:function name="my:alphabet" as="element()*">
<xsl:for-each select="$vCodes">
<xsl:variable name="vChar" select="codepoints-to-string(.)"/>
<a href="req.html?X_letter={$vChar}">
<xsl:sequence select="$vChar"/>
</a>
</xsl:for-each>
</xsl:function>
</xsl:stylesheet>
As Martin suggests it would be better to avoid using disable-output-escaping. You don't need it either, if you are would be satisfied with a plain ascii character instead of the numerical character reference. If so, you can use substring and a alphabet lookup-string like this:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:variable name="alphabet" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
<xsl:template name="alphabet">
<xsl:param name="iLetter" select="65"/>
<xsl:if test="$iLetter < 91">
<a>
<xsl:attribute name="href">req.html?X_letter=<xsl:value-of select="substring($alphabet, $iLetter - 64, 1)"/></xsl:attribute>
<xsl:value-of select="substring($alphabet, $iLetter - 64, 1)"/>
</a>
<xsl:call-template name="alphabet">
<xsl:with-param name="iLetter" select="number($iLetter)+1"/>
</xsl:call-template>
</xsl:if>
</xsl:template>
<xsl:template match="/">
<xsl:call-template name="alphabet"/>
</xsl:template>
</xsl:stylesheet>
Cheers!
Inside of the a element content you could disable output escaping as in
<a href="req.html?X_letter={$iLetter}">
<xsl:value-of select="concat('&#', $iLetter, ';')" disable-output-escaping="yes"/>
</a>
That approach does not work within attribute nodes however so I left that part to pass the character code, not the character.
Also be warned that disable-output-escaping is an optional serialization feature that is not supported with all XSLT processors, for instance Firefox/Mozilla's built-in XSLT processor does not serialize the result tree but simply renders it so there the approach is not going to work.
XSLT 2.0 has the function codepoints-to-string(). With many XSLT 1.0 processors it should be easy enough to implement the same function as an extension function, though it will make your code dependent on that processor.

replacing text in xml using xslt

I have an XML file which has some values in child Element aswell in attributes.
If i want to replace some text when specific value is matched how can i achieve it?
I tried using xlst:translate() function. But i cant use this function for each element or attribute in xml.
So is there anyway to replace/translate value at one shot?
<?xml version="1.0" encoding="UTF-8"?>
<Employee>
<Name>Emp1</Name>
<Age>40</Age>
<sex>M</sex>
<Address>Canada</Address>
<PersonalInformation>
<Country>Canada</country>
<Street1>KO 92</Street1>
</PersonalInformation>
</Employee>
Output :
<?xml version="1.0" encoding="UTF-8"?>
<Employee>
<Name>Emp1</Name>
<Age>40</Age>
<sex>M</sex>
<Address>UnitedStates</Address>
<PersonalInformation>
<Country>UnitedStates</country>
<Street1>KO 92</Street1>
</PersonalInformation>
</Employee>
in the output, replaced text from Canada to UnitedStates.
so, without using xslt:transform() functions on any element , i should be able to replace text Canada to UnitedStates irrespective of level nodes.
Where ever i find 'Canada' i should be able to replace to 'UnitedStates' in entire xml.
So how can i achieve this.?
I. XSLT 1.0 solution:
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:my="my:my" >
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<my:Reps>
<rep>
<old>replace this</old>
<new>replaced</new>
</rep>
<rep>
<old>cat</old>
<new>tiger</new>
</rep>
</my:Reps>
<xsl:variable name="vReps" select=
"document('')/*/my:Reps/*"/>
<xsl:template match="node()|#*" name="identity">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="#*">
<xsl:attribute name="{name()}">
<xsl:call-template name="replace">
<xsl:with-param name="pText" select="."/>
</xsl:call-template>
</xsl:attribute>
</xsl:template>
<xsl:template match="text()" name="replace">
<xsl:param name="pText" select="."/>
<xsl:if test="string-length($pText)">
<xsl:choose>
<xsl:when test=
"not($vReps/old[contains($pText, .)])">
<xsl:copy-of select="$pText"/>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="vthisRep" select=
"$vReps/old[contains($pText, .)][1]
"/>
<xsl:variable name="vNewText">
<xsl:value-of
select="substring-before($pText, $vthisRep)"/>
<xsl:value-of select="$vthisRep/../new"/>
<xsl:value-of select=
"substring-after($pText, $vthisRep)"/>
</xsl:variable>
<xsl:call-template name="replace">
<xsl:with-param name="pText"
select="$vNewText"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
when applied on this XML document:
<t>
<a attr1="X replace this Y">
<b>cat mouse replace this cat dog</b>
</a>
<c/>
</t>
produces the wanted, correct result:
<t>
<a attr1="X replaced Y">
<b>tiger mouse replaced tiger dog</b>
</a>
<c/>
</t>
Explanation:
The identity rule is used to copy "as-is" some nodes.
We perform multiple replacements, parameterized in my:Reps
If a text node or an attribute doesn't contain any rep-target, it is copied as-is.
If a text node or an attribute contains text to be replaced (rep target), then the replacements are done in the order specified in my:Reps
If the string contains more than one string target, then all targets are replaced: first all occurences of the first rep target, then all occurences of the second rep target, ..., last all occurences of the last rep target.
II. XSLT 2.0 solution:
In XSLT 2.0 one can simply use the standard XPath 2.0 function replace(). However, for multiple replacements the solution would be still very similar to the XSLT 1.0 solution specified above.

Complex XSLT split?

Is it possible to split a tag at lower to upper case boundaries i.e.
for example, tag 'UserLicenseCode' should be converted to 'User License Code'
so that the column headers look a little nicer.
I've done something like this in the past using Perl's regular expressions,
but XSLT is a whole new ball game for me.
Any pointers in creating such a template would be greatly appreciated!
Thanks
Krishna
Using recursion, it is possible to walk through a string in XSLT to evaluate every character. To do this, create a new template which accepts only one string parameter. Check the first character and if it's an uppercase character, write a space. Then write the character. Then call the template again with the remaining characters inside a single string. This would result in what you want to do.
That would be your pointer. I will need some time to work out the template. :-)
It took some testing, especially to get the space inside the whole thing. (I misused a character for this!) But this code should give you an idea...
I used this XML:
<?xml version="1.0" encoding="UTF-8"?>
<blah>UserLicenseCode</blah>
and then this stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:output method="text"/>
<xsl:variable name="Space">*</xsl:variable>
<xsl:template match="blah">
<xsl:variable name="Split">
<xsl:call-template name="Split">
<xsl:with-param name="Value" select="."/>
<xsl:with-param name="First" select="true()"/>
</xsl:call-template></xsl:variable>
<xsl:value-of select="translate($Split, '*', ' ')" />
</xsl:template>
<xsl:template name="Split">
<xsl:param name="Value"/>
<xsl:param name="First" select="false()"/>
<xsl:if test="$Value!=''">
<xsl:variable name="FirstChar" select="substring($Value, 1, 1)"/>
<xsl:variable name="Rest" select="substring-after($Value, $FirstChar)"/>
<xsl:if test="not($First)">
<xsl:if test="translate($FirstChar, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', '..........................')= '.'">
<xsl:value-of select="$Space"/>
</xsl:if>
</xsl:if>
<xsl:value-of select="$FirstChar"/>
<xsl:call-template name="Split">
<xsl:with-param name="Value" select="$Rest"/>
</xsl:call-template>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
and I got this as result:
User License Code
Do keep in mind that spaces and other white-space characters do tend to be stripped away from XML, which is why I used an '*' instead, which I translated to a space.
Of course, this code could be improved. It's what I could come up with in 10 minutes of work. In other languages, it would take less lines of code but in XSLT it's still quite fast, considering the amount of code lines it contains.
An XSLT + FXSL solution (in XSLT 2.0, but almost the same code will work with XSLT 1.0 and FXSL 1.x:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:f="http://fxsl.sf.net/"
xmlns:testmap="testmap"
exclude-result-prefixes="f testmap"
>
<xsl:import href="../f/func-str-dvc-map.xsl"/>
<testmap:testmap/>
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="vTestMap" select="document('')/*/testmap:*[1]"/>
'<xsl:value-of select="f:str-map($vTestMap, 'UserLicenseCode')"
/>'
</xsl:template>
<xsl:template name="mySplit" match="*[namespace-uri() = 'testmap']"
mode="f:FXSL">
<xsl:param name="arg1"/>
<xsl:value-of select=
"if(lower-case($arg1) ne $arg1)
then concat(' ', $arg1)
else $arg1
"/>
</xsl:template>
</xsl:stylesheet>
When the above transformation is applied on any source XML document (not used), the expected correct result is produced:
' User License Code'
Do note:
We are using the DVC version of the FXSL function/template str-map(). This is a Higher-order function (HOF) which takes two arguments: another function and a string. str-map() applies the function on every character of the string and returns the concatenation of the results.
Because the lower-case() function is used (in the XSLT 2.0 version), we are not constrained to only the Latin alphabet.