I have following xml
<xml>
<xref>
is determined “in prescribed manner”
</xref>
</xml>
I want to see if we can process xslt 2 and return the following result
<xml>
<xref>
is
</xref>
<xref>
determined
</xref>
<xref>
“in prescribed manner”
</xref>
</xml>
I tried few options like replace the space and entities and then using for-each loop but not able to work it out. May be we can use tokenize function of xslt 2.0 but don't know how to use it. Any hint will be helpful.
# JimGarrison: Sorry, I couldn't resist. :-) This XSLT is definitely not elegant but it does (I assume) most of the job:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />
<xsl:variable name="left_quote" select="'<'"/>
<xsl:variable name="right_quote" select="'>'"/>
<xsl:template name="protected_tokenize">
<xsl:param name="string"/>
<xsl:variable name="pattern" select="concat('^([^', $left_quote, ']+)(', $left_quote, '[^', $right_quote, ']*', $right_quote,')?(.*)')"/>
<xsl:analyze-string select="$string" regex="{$pattern}">
<xsl:matching-substring>
<!-- Handle the prefix of the string up to the first opening quote by "normal" tokenizing. -->
<xsl:variable name="prefix" select="concat(' ', normalize-space(regex-group(1)))"/>
<xsl:for-each select="tokenize(normalize-space($prefix), ' ')">
<xref>
<xsl:value-of select="."/>
</xref>
</xsl:for-each>
<!-- Handle the text between the quotes by simply passing it through. -->
<xsl:variable name="protected_token" select="normalize-space(regex-group(2))"/>
<xsl:if test="$protected_token != ''">
<xref>
<xsl:value-of select="$protected_token"/>
</xref>
</xsl:if>
<!-- Handle the suffix of the string. This part may contained protected tokens again. So we do it recursively. -->
<xsl:variable name="suffix" select="normalize-space(regex-group(3))"/>
<xsl:if test="$suffix != ''">
<xsl:call-template name="protected_tokenize">
<xsl:with-param name="string" select="$suffix"/>
</xsl:call-template>
</xsl:if>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:template>
<xsl:template match="*|#*">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="xref">
<xsl:call-template name="protected_tokenize">
<xsl:with-param name="string" select="text()"/>
</xsl:call-template>
</xsl:template>
</xsl:stylesheet>
Notes:
There is the general assumption that white space only serves as a token delimiter and need not be preserved.
“ and rdquo; seem to be invalid in XML although they are valid in HTML. In the XSLT there are variables defined holding the quote characters. They will have to be adapted once you find the right XML representation. You can also eliminate the variables and put the characters right into the regular expression pattern. It will be significantly simplified by this.
<xsl:analyze-string> does not allow a regular expression which may evaluate into an empty string. This comes as a little problem since either the prefix and/or the proteced token and/or the suffix may be empty. I take care of this by artificially adding a space at the beginning of the pattern which allows me to search for the prefix using + (at least one occurence) instead of * (zero or more occurences).
Related
I'm trying to solve a problem, where I have to translate strings using xslt.
I saw this: XSLT key() lookup
and this: XSLT Conditional Lookup Table
but I'm not able to get it to work. I've tried to come up with the minimal example below which shows the problems that I'm facing.
The "real" xsl is assembled from code snippets using a build process. This involves some constraints.
The inner structure of the translation lookup tables always is the same, since they are downloaded from a translation tool in flat xml format http://docs.translatehouse.org/projects/translate-toolkit/en/latest/formats/flatxml.html. I can only wrap them into distinct parent nodes which is what i tried using the "lu" namespace.
The translation tables for all languages have to be stored inside the xsl, because different generations of xsl with different translations may exist next to each other. So no "sidecar" files.
Until now I can't get the key to work. The output of xsltproc is the following:
Setup Key - Start
German
xsltApplyOneTemplate: key was not compiled
Setup Key - End
de # skipped #
de # failed #
Expected output:
Setup Key - Start
German
Setup Key - End
de # skipped # Übersprungen
de # failed # Fehlgeschlagen
The XML file just needs to contain a root element.
So obviously the way I try to define the key depending on the target language is wrong, but my xsl knowledge has reached its limit now. The language stays the same during the transformation, so the key for all translation lookups has to be set up only once at the beginning.
The xsl Transformation:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:lu="http://www.my.domain.de/lookup"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:variable name="vLanguageCode">
<!-- <xsl:value-of select="/root/#language"/> -->
<xsl:value-of select="'de'"/>
</xsl:variable>
<xsl:template match="/">
<xsl:call-template name="setupKey"/>
<xsl:call-template name="getLabel">
<xsl:with-param name="pKey" select="'skipped'"/>
</xsl:call-template>
<xsl:call-template name="getLabel">
<xsl:with-param name="pKey" select="'failed'"/>
</xsl:call-template>
</xsl:template>
<xsl:template name="setupKey">
<xsl:message>Setup Key - Start</xsl:message>
<xsl:choose>
<xsl:when test="$vLanguageCode='DE' or $vLanguageCode='de'">
<xsl:message>German</xsl:message>
<xsl:key name="kLanguageDict" match="/lu:de/root/str" use="#key"/>
</xsl:when>
<xsl:otherwise>
<xsl:message>English (default)</xsl:message>
<xsl:key name="kLanguageDict" match="/lu:en/root/str" use="#key"/>
</xsl:otherwise>
</xsl:choose>
<xsl:message>Setup Key - End</xsl:message>
</xsl:template>
<xsl:template name="getLabel">
<xsl:param name="pKey"/>
<xsl:variable name="vResult">
<xsl:value-of select="key('kLanguageDict', $pKey)/#str"/>
</xsl:variable>
<xsl:choose>
<xsl:when test="$vResult!=''">
<xsl:value-of select="$vResult"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$pKey"/>
</xsl:otherwise>
</xsl:choose>
<xsl:message>
<xsl:value-of select="concat($vLanguageCode, ' # ', $pKey, ' # ', $vResult)"/>
</xsl:message>
</xsl:template>
<lu:de>
<root>
<str key="skipped">Übersprungen</str>
<str key="failed">Fehlgeschlagen</str>
</root>
</lu:de>
<lu:en>
<root>
<str key="skipped">Skipped</str>
<str key="failed">Failed</str>
</root>
</lu:en>
</xsl:stylesheet>
Additions in response to the answer from #michael.hor257k:
Thank you. I didn't know that. So this means that I can't selectively define a key depending on language?
The translation system originally has one key at the top level and a translation table with interleaved entries for each language. It uses a double index (language+id) to look up the values.
I am trying to find a solution where I can embed the xml files returned by the translation management system (weblate) directly into the xsl without having to modify them. Unfortunately it looks like I'm limited in what I can get back (only default nodes and attributes).
This is the core of the original working translation lookup code:
<xsl:variable name="vLanguageDict" select="document('')/*/lu:strings"/>
<xsl:key name="kLanguageDict" match="lu:string" use="concat(#lang,#id)"/>
<xsl:template name="getLabel">
<xsl:param name="pKey"/>
<xsl:variable name="vResult">
<xsl:for-each select="$vLanguageDict">
<xsl:value-of select="key('kLanguageDict', concat($vLanguageCode,$pKey))/#value" />
</xsl:for-each>
</xsl:variable>
<xsl:choose>
<xsl:when test="$vResult!=''">
<xsl:value-of select="$vResult"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$pKey"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<lu:strings>
<lu:string lang="DE" id="skipped" value="Übersprungen"/>
<lu:string lang="EN" id="skipped" value="skipped"/>
<lu:string lang="DE" id="failed" value="Fehlgeschlagen"/>
<lu:string lang="EN" id="failed" value="failed"/>
</lu:strings>
There are two mistakes in your XSLT stylesheet that immediately jump out:
The xsl:key element is allowed only at the top level, as a child
of the xsl:stylesheet element.
In XSLT 1.0, keys operate only on the current document. If you want to lookup from the stylesheet itself, you must change the context to the stylesheet document before calling the key() function. Here are two examples: https://stackoverflow.com/a/32440143/3016153
https://stackoverflow.com/a/30188334/3016153
I am afraid that's about all that can be said without a reproducible example.
--- added ---
So this means that I can't selectively define a key depending on language?
You cannot define a key conditionally - but you can define more than one key and select the one to use based on the specified language. Here's a simplified example:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:dict="http://example.com/dict">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:key name="de" match="dict:de/entry" use="#key" />
<xsl:key name="en" match="dict:en/entry" use="#key" />
<xsl:param name="input">skipped</xsl:param>
<xsl:param name="lang">de</xsl:param>
<xsl:template match="/">
<xsl:value-of select="$lang"/>
<xsl:text> # </xsl:text>
<xsl:value-of select="$input"/>
<xsl:text> = </xsl:text>
<!-- switch context to stylesheet in order to use key -->
<xsl:for-each select="document('')">
<xsl:value-of select="key($lang, $input)"/>
</xsl:for-each>
</xsl:template>
<dict:de>
<entry key="skipped">Übersprungen</entry>
<entry key="failed">Fehlgeschlagen</entry>
</dict:de>
<dict:en>
<entry key="skipped">Skipped</entry>
<entry key="failed">Failed</entry>
</dict:en>
</xsl:stylesheet>
Applied to any XML input, this will return:
Result
de : skipped = Übersprungen
I have a couple of XML files that contain unicode characters with codepoint values between 57600 and 58607. Currently these are shown in my content as square blocks and I'd like to convert these to elements.
So what I'd like to achieve is something like :
<!-- current input -->
<p> Follow the on-screen instructions.</p>
<!-- desired output-->
<p><unichar value="58208"/> Follow the on-screen instructions.</p>
<!-- Where 58208 is the actual codepoint of the unicode character in question -->
I've fooled around a bit with tokenizer but as you need to have reference to split upon this turned out to be over complicated.
Any advice on how to tackle this best ? I've been trying some things like below but got struck (don't mind the syntax, I know it doesn't make any sense)
<xsl:template match="text()">
-> for every character in my string
-> if string-to-codepoints(current character) greater then 57600 return <unichar value="codepoint value"/>
else return character
</xsl:template>
It sounds like a job for analyze-string e.g.
<xsl:template match="text()">
<xsl:analyze-string select="." regex="[-]">
<xsl:matching-substring>
<unichar value="{string-to-codepoints(.)}"/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
Untested.
This transformation:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes"/>
<xsl:template match="/*">
<p>
<xsl:for-each select="string-to-codepoints(.)">
<xsl:choose>
<xsl:when test=". > 57600">
<unichar value="{.}"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="codepoints-to-string(.)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</p>
</xsl:template>
</xsl:stylesheet>
when applied on the provided XML document:
<p> Follow the on-screen instructions.</p>
produces the wanted, correct result:
<p><unichar value="58498"/> Follow the on-screen instructions.</p>
Explanation: Proper use of the standard XPath 2.0 functions string-to-codepoints() and codepoints-to-string().
I have the following XSLT-function that I use in a XSLT file to generate XHTML output:
<xsl:function name="local:if-not-empty">
<xsl:param name="prefix"/>
<xsl:param name="str"/>
<xsl:param name="suffix"/>
<xsl:if test="$str != ''"><xsl:value-of select="concat($prefix, $str, $suffix)"/></xsl:if>
</xsl:function>
it simply checks whether a string str is not empty and, if so, returns the string, concatenated with a prefix and a suffix.
The function works fine as long as I only pass simple strings. But when I try to pass HTML elements as prefix or suffix, e.g.:
<xsl:value-of select="local:if-not-empty('', /some/xpath/expression, '<br/>')"/>
I get the following error message:
SXXP0003: Error reported by XML parser: The value of attribute "select"
associated with an element type "null" must not contain the '<' character.
The next thing I tried was to define a variable:
<xsl:variable name="br"><br/></xsl:variable>
and pass it to the function:
<xsl:value-of select="local:if-not-empty('', /some/xpath/expression, $br)"/>
but here, of course, I get an empty string, as the value of the element is extracted, and not the element itself copied.
My final hopeless attempt was to define a text element in the variable:
<xsl:variable name="br">
<xsl:text disable-output-escaping="yes"><br/></xsl:text>
</xsl:variable>
and pass this to the function, but this wasn't permitted, either.
XTSE0010: xsl:text must not contain child elements
I probably don't understand the intricate inner workings of XSLT, but in my opinion adding a <br/> element within a XSLT-transformation through a generic function seems legitimate...
Anyways... I'd appreciate if anyone could give me an alternative solution. I'd also like to understand why this doesn't work...
PS: I'm using Saxon-HE 9.4.0.1J, Java version 1.6.0_24
Try this:
<xsl:value-of select="local:if-not-empty('', /some/xpath/expression, '<br/>')" disable-output-escaping="yes"/>
Instead of concat, use: <xsl:copy-of> and pass as parameters items not strings:
<xsl:copy-of select="$pPrefix"/>
<xsl:copy-of select="$pStr"/>
<xsl:copy-of select="$pSuffix"/>
Here is a complete example:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:local="my:local" exclude-result-prefixes="local">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vBr"><br/></xsl:variable>
<xsl:template match="/">
<xsl:sequence select="local:if-not-empty('a', 'b', $vBr/*)"/>
</xsl:template>
<xsl:function name="local:if-not-empty">
<xsl:param name="pPrefix"/>
<xsl:param name="pStr"/>
<xsl:param name="pSuffix"/>
<xsl:if test="$pStr != ''">
<xsl:copy-of select="$pPrefix"/>
<xsl:copy-of select="$pStr"/>
<xsl:copy-of select="$pSuffix"/>
</xsl:if>
</xsl:function>
</xsl:stylesheet>
When this transformation is applied on any XML document (not used), the wanted, correct result is produced:
a b<br/>
The problem is that <br/> is not a string - it is an XML element, so it cannot be manipulated using string functions. You need a separate function like this:
<xsl:function name="local:br-if-not-empty">
<xsl:param name="prefix"/>
<xsl:param name="str"/>
<xsl:if test="$str != ''">
<xsl:value-of select="concat($prefix, $str)"/>
<br/>
</xsl:if>
</xsl:function>
or a 'trick' like this where you handle <br/> as a separate case:
<xsl:function name="local:if-not-empty">
<xsl:param name="prefix"/>
<xsl:param name="str"/>
<xsl:param name="suffix"/>
<xsl:if test="$str != ''">
<xsl:value-of select="concat($prefix, $str)"/>
<xsl:choose>
<xsl:when test="$suffix = '<br/>'>
<br/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$suffix"/>
</xsl:otherwise>
</xsl:choose>
</xsl:if>
</xsl:function>
I'm trying to develop an XSLT stylesheet which transforms a given DocBook document to a file which can be fed to the lout document formatting system (which then generates PostScript output).
Doing so requires that I replace a few characters in the text of DocBook elements because they have a special meaning to lout. In particular, the characters
/ | & { } # # ~ \ "
need to be enclosed in double quotes (") so that lout treats them as ordinary characters.
For instance, a DocBook element like
<para>This is a sample {a contrived one at that} ~ it serves no special purpose.</para>
should be transformed to
#PP
This is a sample "{"a contrived one at that"}" "~" it serves no special purpose.
How can I do this with XSLT? I'm using xsltproc, so using XPath 2.0 functions is not an option but a number of EXSLT functions are available.
I tried using a recursive template which yields the substring up to a special character (e.g. {), then the escaped character sequence ("{") and then calls itself on the substring after the special character. However, I have a hard time making this work properly when trying to replace multiple characters, and one of them is used in the escaped sequence itself.
In particular, the characters
/ | & { } # # ~ \ "
need to be enclosed in double quotes
(") so that lout treats them as
ordinary characters.
I. This is most easily accomplished using the str-map template of FXSL:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:f="http://fxsl.sf.net/"
xmlns:strmap="strmap"
exclude-result-prefixes="xsl f strmap">
<xsl:import href="str-dvc-map.xsl"/>
<xsl:output method="text"/>
<strmap:strmap/>
<xsl:template match="/">
<xsl:variable name="vMapFun" select="document('')/*/strmap:*[1]"/>
#PP
<xsl:call-template name="str-map">
<xsl:with-param name="pFun" select="$vMapFun"/>
<xsl:with-param name="pStr" select="."/>
</xsl:call-template>
</xsl:template>
<xsl:template name="escape" match="strmap:*" mode="f:FXSL">
<xsl:param name="arg1"/>
<xsl:variable name="vspecChars">/|&{}##~\"</xsl:variable>
<xsl:variable name="vEscaping" select=
"substring('"', 1 div contains($vspecChars, $arg1))
"/>
<xsl:value-of select=
"concat($vEscaping, $arg1, $vEscaping)"/>
</xsl:template>
</xsl:stylesheet>
when this transformation is aplied on the provided XML document:
<para>This is a sample {a contrived one at that} ~ it serves no special purpose.</para>
the wanted, correct result is produced:
#PP
This is a sample "{"a contrived one at that"}" "~" it serves no special purpose.
II. With XSLT 1.0 recursive named template:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
#PP
<xsl:call-template name="escape">
<xsl:with-param name="pStr" select="."/>
</xsl:call-template>
</xsl:template>
<xsl:template name="escape">
<xsl:param name="pStr" select="."/>
<xsl:param name="pspecChars">/|&{}##~\"</xsl:param>
<xsl:if test="string-length($pStr)">
<xsl:variable name="vchar1" select="substring($pStr,1,1)"/>
<xsl:variable name="vEscaping" select=
"substring('"', 1 div contains($pspecChars, $vchar1))
"/>
<xsl:value-of select=
"concat($vEscaping, $vchar1, $vEscaping)"/>
<xsl:call-template name="escape">
<xsl:with-param name="pStr" select="substring($pStr,2)"/>
<xsl:with-param name="pspecChars" select="$pspecChars"/>
</xsl:call-template>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
Is it possible to split a tag at lower to upper case boundaries i.e.
for example, tag 'UserLicenseCode' should be converted to 'User License Code'
so that the column headers look a little nicer.
I've done something like this in the past using Perl's regular expressions,
but XSLT is a whole new ball game for me.
Any pointers in creating such a template would be greatly appreciated!
Thanks
Krishna
Using recursion, it is possible to walk through a string in XSLT to evaluate every character. To do this, create a new template which accepts only one string parameter. Check the first character and if it's an uppercase character, write a space. Then write the character. Then call the template again with the remaining characters inside a single string. This would result in what you want to do.
That would be your pointer. I will need some time to work out the template. :-)
It took some testing, especially to get the space inside the whole thing. (I misused a character for this!) But this code should give you an idea...
I used this XML:
<?xml version="1.0" encoding="UTF-8"?>
<blah>UserLicenseCode</blah>
and then this stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:output method="text"/>
<xsl:variable name="Space">*</xsl:variable>
<xsl:template match="blah">
<xsl:variable name="Split">
<xsl:call-template name="Split">
<xsl:with-param name="Value" select="."/>
<xsl:with-param name="First" select="true()"/>
</xsl:call-template></xsl:variable>
<xsl:value-of select="translate($Split, '*', ' ')" />
</xsl:template>
<xsl:template name="Split">
<xsl:param name="Value"/>
<xsl:param name="First" select="false()"/>
<xsl:if test="$Value!=''">
<xsl:variable name="FirstChar" select="substring($Value, 1, 1)"/>
<xsl:variable name="Rest" select="substring-after($Value, $FirstChar)"/>
<xsl:if test="not($First)">
<xsl:if test="translate($FirstChar, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', '..........................')= '.'">
<xsl:value-of select="$Space"/>
</xsl:if>
</xsl:if>
<xsl:value-of select="$FirstChar"/>
<xsl:call-template name="Split">
<xsl:with-param name="Value" select="$Rest"/>
</xsl:call-template>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
and I got this as result:
User License Code
Do keep in mind that spaces and other white-space characters do tend to be stripped away from XML, which is why I used an '*' instead, which I translated to a space.
Of course, this code could be improved. It's what I could come up with in 10 minutes of work. In other languages, it would take less lines of code but in XSLT it's still quite fast, considering the amount of code lines it contains.
An XSLT + FXSL solution (in XSLT 2.0, but almost the same code will work with XSLT 1.0 and FXSL 1.x:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:f="http://fxsl.sf.net/"
xmlns:testmap="testmap"
exclude-result-prefixes="f testmap"
>
<xsl:import href="../f/func-str-dvc-map.xsl"/>
<testmap:testmap/>
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="vTestMap" select="document('')/*/testmap:*[1]"/>
'<xsl:value-of select="f:str-map($vTestMap, 'UserLicenseCode')"
/>'
</xsl:template>
<xsl:template name="mySplit" match="*[namespace-uri() = 'testmap']"
mode="f:FXSL">
<xsl:param name="arg1"/>
<xsl:value-of select=
"if(lower-case($arg1) ne $arg1)
then concat(' ', $arg1)
else $arg1
"/>
</xsl:template>
</xsl:stylesheet>
When the above transformation is applied on any source XML document (not used), the expected correct result is produced:
' User License Code'
Do note:
We are using the DVC version of the FXSL function/template str-map(). This is a Higher-order function (HOF) which takes two arguments: another function and a string. str-map() applies the function on every character of the string and returns the concatenation of the results.
Because the lower-case() function is used (in the XSLT 2.0 version), we are not constrained to only the Latin alphabet.