XSL analyze-string difficulty with tokenized strings - regex

I need to tokenize a string and then run analyze-string on each of the tokens. This, however, seems impossible:
"XPTY0020: Required item type of the context item for the child axis
is node(); supplied value has item type xs:string) because
analyze-string requires a node context".
This is driving me insane, because analyze-string should, well, analyze strings, so I don't understand how to go around this problem.
My (simplified) XML looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<rows>
<row>
<field name="def">1) ἀλλά sed, vero 2) καί et 3) а cum condicionali iunctum aequiparat
аште: 4) ἵνα ut chron.</field>
</row>
<row>
<field name="def">ἡλοῦν clavo figere</field>
</row>
</rows>
and my stylesheet looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" version="2.0">
<xsl:strip-space elements="*"/>
<xsl:output omit-xml-declaration="no" indent="yes"/>
<xsl:template match="field[#name = 'def']">
<entry>
<xsl:call-template name="sense">
<xsl:with-param name="def" select="."/>
</xsl:call-template>
</entry>
</xsl:template>
<xsl:template name="sense">
<xsl:param name="def"/>
<xsl:param name="separator" select="'\d{1,2}\)\s'"/>
<xsl:for-each select="tokenize(normalize-space($def), $separator)">
<xsl:if test="string-length(.) > 0">
<xsl:element name="sense">
<xsl:attribute name="n">
<xsl:value-of select="position() - 1"/>
</xsl:attribute>
<!--this is the problematic bit, because current() is
a string here -\- and, paradoxically, analyze-string
cannot deal with it-->
<xsl:analyze-string select="current()"
regex="^([\p{IsGreek}\p{IsGreekExtended}]+[\s]*[\p{IsGreek}\p{IsGreekExtended}]*)(.*$)">
<xsl:matching-substring>
<greek>
<xsl:value-of select="regex-group(1)"/>
<xsl:value-of select="regex-group(2)"/>
</greek>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="current()"/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:element>
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Without the problematic of analyze-string, the above stylesheet will correctly produce the following output:
<?xml version="1.0" encoding="UTF-8"?>
<entry xmlns:xs="http://www.w3.org/2001/XMLSchema">
<sense n="1">ἀλλά sed, vero </sense>
<sense n="2">καί et </sense>
<sense n="3">а cum condicionali iunctum aequiparat аште: </sense>
<sense n="4">ἵνα ut chron.</sense>
</entry>
<entry xmlns:xs="http://www.w3.org/2001/XMLSchema">
<sense n="0">ἡλοῦν clavo figere</sense>
</entry>
The stylesheet uses the tokenize() method in order to separate multiple senses. Then, for each of the identified senses, I want to use analyze-string to wrap the first greek word with <greek></greek>.
What workaround can I use to make analyze-string work on tokens, i.e. strings, rather than nodes?
Many thanks in advance!

I think the problem is that the regex attribute allows attribute value templates so your curly braces need to be doubled to say
regex="^([\p{{IsGreek}}\p{{IsGreekExtended}}]+[\s]*[\p{{IsGreek}}\p{{IsGreekExtended}}]*)(.*$)"
Or you need to define the pattern outside in a variable e.g.
<xsl:variable name="pattern">^([\p{IsGreek}\p{IsGreekExtended}]+[\s]*[\p{IsGreek}\p{IsGreekExtended}]*)(.*$)</xsl:variable>
and use regex="{$pattern}".

Related

How can i use xsl:if for this result?

How can i use xsl:if for output type=label. I don't how can I make if statement syntax.
I'm use xslt 1.0.
<xsl:if test="">
<xsl:attribute name="type">
<xsl:value-of select=""/>
</xsl:attribute>
</xsl:if>
this is resource :
<xxxxx type="str">label</xxxxx>
I like to output like this
<key name="xxxxx" type="label"/>
The expression you want is this, assuming you are matching the xxxxx element
<xsl:if test="#type='str'">
Note that, I don't know what the rest of your XSLT looks like, or if you were looking for something generic, but you might want to learn about Attribute Value Templates if you were creating or changing other attributes. For example...
<xsl:template match="*">
<key name="{local-name()}">
<xsl:if test="#type='str'">
<xsl:attribute name="type">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:if>
</key>
</xsl:template>
When applied to this XSLT
<xxxxx type="str">label</xxxxx>
The following is output
<key name="xxxxx" type="label"/>
How can i use xsl:if for output type=label. I don't how can I make if
statement syntax.
I'm use xslt 1.0.
When using XSLT it is rarely necessary to use any XSLT conditional instructions at all -- when using the full power of the language these can (and should) be avoided.
Here is one such solution to the problem:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="xxxxx[#type='str']">
<key name="xxxxx" type="{.}"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the following XML document (none provided!):
<t>
<a/>
<xxxxx type="str">label</xxxxx>
<b/>
<c/>
</t>
the wanted, correct result is produced:
<key name="xxxxx" type="label"/>

XSLT: Replace string with Abbreviations

I would like to know how to replace the string with the abbreviations.
My XML looks like below
<concept reltype="CONTAINS" name="Left Ventricular Major Axis Diastolic Dimension, 4-chamber view" type="NUM">
<code meaning="Left Ventricular Major Axis Diastolic Dimension, 4-chamber view" value="18074-5" schema="LN" />
<measurement value="5.7585187646">
<code value="cm" schema="UCUM" />
</measurement>
<content>
<concept reltype="HAS ACQ CONTEXT" name="Image Mode" type="CODE">
<code meaning="Image Mode" value="G-0373" schema="SRT" />
<code meaning="2D mode" value="G-03A2" schema="SRT" />
</concept>
</content>
</concept>
and I am selecting some value from the xml like,
<xsl:value-of select="concept/measurement/code/#value"/>
Now what I want is, I have to replace cm with centimeter. I have so many words like this. I would like to have a xml for abbreviations and replace from them.
I saw one similar example here.
Using a Map in XSL for expanding abbreviations
But it replaces node text, but I have text as attribute. Also, it would be better for me If I can find and replace when I select text using xsl:valueof select instead of having a separate xsl:template. Please help. I am new to xslt.
I have created XSLT v "1.1". For abbreviations I have created XML file as you have mentioned:
Abbreviation.xml:
<Abbreviations>
<Abbreviation>
<Short>cm</Short>
<Full>centimeter</Full>
</Abbreviation>
<Abbreviation>
<Short>m</Short>
<Full>meter</Full>
</Abbreviation>
</Abbreviations>
XSLT:
<xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" method="xml" />
<xsl:param name="AbbreviationDoc" select="document('Abbreviation.xml')"/>
<xsl:template match="/">
<xsl:call-template name="Convert">
<xsl:with-param name="present" select="concept/measurement/code/#value"/>
</xsl:call-template>
</xsl:template>
<xsl:template name="Convert">
<xsl:param name="present"/>
<xsl:choose>
<xsl:when test="$AbbreviationDoc/Abbreviations/Abbreviation[Short = $present]">
<xsl:value-of select="$AbbreviationDoc/Abbreviations/Abbreviation[Short = $present]/Full"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$present"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
INPUT:
as you have given <xsl:value-of select="concept/measurement/code/#value"/>
OUTPUT:
centimeter
You just need to enhance this Abbreviation.xml to keep short and full value of abbreviation and call 'Convert' template with passing current value to get desired output.
Here a little shorter version:
- with abbreviations in xslt file
- make use of apply-templates with mode to make usage shorter.
But with xslt 1.0 node-set extension is required.
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
extension-element-prefixes="exsl">
<xsl:output method="xml" indent="yes"/>
<xsl:variable name="abbreviations_txt">
<abbreviation abbrev="cm" >centimeter</abbreviation>
<abbreviation abbrev="m" >meter</abbreviation>
</xsl:variable>
<xsl:variable name="abbreviations" select="exsl:node-set($abbreviations_txt)" />
<xsl:template match="/">
<xsl:apply-templates select="concept/measurement/code/#value" mode="abbrev_to_text"/>
</xsl:template>
<xsl:template match="* | #*" mode="abbrev_to_text">
<xsl:variable name="abbrev" select="." />
<xsl:variable name="long_text" select="$abbreviations//abbreviation[#abbrev = $abbrev]/text()" />
<xsl:value-of select="$long_text"/>
<xsl:if test="not ($long_text)">
<xsl:value-of select="$abbrev"/>
</xsl:if>
</xsl:template>
</xsl:stylesheet>

XPath/XSLT nested predicates: how to get the context of outer predicate?

It seems that this question was not discussed on stackoverflow before, save for Working With Nested XPath Predicates ... Refined where the solution not involving nested predicates was offered.
So I tried to write the oversimplified sample of what I'd like to get:
Input:
<root>
<shortOfSupply>
<food animal="doggie"/>
<food animal="horse"/>
</shortOfSupply>
<animalsDictionary>
<cage name="A" animal="kittie"/>
<cage name="B" animal="dog"/>
<cage name="C" animal="cow"/>
<cage name="D" animal="zebra"/>
</animals>
</root>
Output:
<root>
<hungryAnimals>
<cage name="B"/>
<cage name="D"/>
</hungryAnimals>
</root>
or, alternatively, if there is no intersections,
<root>
<everythingIsFine/>
</root>
And i want to get it using a nested predicates:
<xsl:template match="cage">
<cage>
<xsl:attribute name="name">
<xsl:value-of select="#name"/>
</xsl:attribute>
</cage>
</xsl:template>
<xsl:template match="/root/animalsDictionary">
<xsl:choose>
<!-- in <food> in <cage> -->
<xsl:when test="cage[/root/shortOfSupply/food[ext:isEqualAnimals(./#animal, ?????/#animal)]]">
<hungryAnimals>
<xsl:apply-templates select="cage[/root/shortOfSupply/food[ext:isEqualAnimals(#animal, ?????/#animal)]]"/>
</hungryAnimals>
</xsl:when>
<xsl:otherwise>
<everythingIsFine/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
So what should i write in place of that ??????
I know i could rewrite the entire stylesheet using one more template and extensive usage of variables/params, but it makes even this stylesheet significantly more complex, let alone the real stylesheet i have for real problem.
It is written in XPath reference that the dot . sign means the current context node, but it doesn't tell whether there is any possibility to get the node of context before that; and i just can't believe XPath is missing this obvious feature.
XPath 2.0 one-liner:
for $a in /*/animalsDictionary/cage
return
if(/*/shortOfSupply/*[my:isA($a/#animal, #animal)])
then $a
else ()
When applied on the provided XML document selects:
<cage name="B"/>
<cage name="D"/>
One cannot use a single XPath 1.0 expression to find that a given cage contains a hungry animal.
Here is an XSLT solution (XSLT 2.0 is used only to avoid using an extension function for the comparison -- in an XSLT 1.0 solution one will use an extension function for the comparison and the xxx:node-set() extension to test if the RTF produced by applying templates in the body of the variable contains any child element):
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:my="my:my" exclude-result-prefixes="xs my">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<my:Dict>
<a genName="doggie">
<name>dog</name>
<name>bulldog</name>
<name>puppy</name>
</a>
<a genName="horse">
<name>horse</name>
<name>zebra</name>
<name>pony</name>
</a>
<a genName="cat">
<name>kittie</name>
<name>kitten</name>
</a>
</my:Dict>
<xsl:variable name="vDict" select=
"document('')/*/my:Dict/a"/>
<xsl:template match="/">
<root>
<xsl:variable name="vhungryCages">
<xsl:apply-templates select=
"/*/animalsDictionary/cage"/>
</xsl:variable>
<xsl:choose>
<xsl:when test="$vhungryCages/*">
<hungryAnimals>
<xsl:copy-of select="$vhungryCages"/>
</hungryAnimals>
</xsl:when>
<xsl:otherwise>
<everythingIsFine/>
</xsl:otherwise>
</xsl:choose>
</root>
</xsl:template>
<xsl:template match="cage">
<xsl:if test="
/*/shortOfSupply/*[my:isA(current()/#animal,#animal)]">
<cage name="{#name}"/>
</xsl:if>
</xsl:template>
<xsl:function name="my:isA" as="xs:boolean">
<xsl:param name="pSpecName" as="xs:string"/>
<xsl:param name="pGenName" as="xs:string"/>
<xsl:sequence select=
"$pSpecName = $vDict[#genName = $pGenName]/name"/>
</xsl:function>
</xsl:stylesheet>
When this transformation is applied on the provided XML document (corrected to be well-formed):
<root>
<shortOfSupply>
<food animal="doggie"/>
<food animal="horse"/>
</shortOfSupply>
<animalsDictionary>
<cage name="A" animal="kittie"/>
<cage name="B" animal="dogs"/>
<cage name="C" animal="cow"/>
<cage name="D" animal="zebras"/>
</animalsDictionary>
</root>
the wanted, correct result is produced:
<root>
<hungryAnimals>
<cage name="B"/>
<cage name="D"/>
</hungryAnimals>
</root>
Explanation: Do note the use of the XSLT current() function.
XPath 1.0 is not "relationally complete" - it can't do arbitrary joins. If you're in XSLT, you can always get round the limitations by binding variables to intermediate nodesets, or (sometimes) by using the current() function.
XPath 2.0 introduces range variables, which makes it relationally complete, so this limitation has gone.
Doesn't <xsl:when test="cage[#animal = /root/shortOfSupply/food/#animal]"> suffice to express your test condition?
Notice The dot operator in XPath is related to the current context. In XSLT the current template context_ is given by the function current(), which most of the time (not always) coincides with the ..
You can perform the test (and the apply templates as well), using the parent axis abbreviation (../):
cage[#animal=../../shortOfSupply/food/#animal]
Moreover the match pattern in the the first template is wrong, it should be relative to the root:
/root/animalsDictionary
#Martin suggestion is also obviously correct.
Your final template slightly modified:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="root/animalsDictionary">
<xsl:choose>
<xsl:when test="cage[#animal=../../shortOfSupply/food/#animal]">
<hungryAnimals>
<xsl:apply-templates select="cage[#animal
=../../shortOfSupply/food/#animal]"/>
</hungryAnimals>
</xsl:when>
<xsl:otherwise>
<everythingIsFine/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="cage">
<cage name="{#name}"/>
</xsl:template>
</xsl:stylesheet>

replacing text in xml using xslt

I have an XML file which has some values in child Element aswell in attributes.
If i want to replace some text when specific value is matched how can i achieve it?
I tried using xlst:translate() function. But i cant use this function for each element or attribute in xml.
So is there anyway to replace/translate value at one shot?
<?xml version="1.0" encoding="UTF-8"?>
<Employee>
<Name>Emp1</Name>
<Age>40</Age>
<sex>M</sex>
<Address>Canada</Address>
<PersonalInformation>
<Country>Canada</country>
<Street1>KO 92</Street1>
</PersonalInformation>
</Employee>
Output :
<?xml version="1.0" encoding="UTF-8"?>
<Employee>
<Name>Emp1</Name>
<Age>40</Age>
<sex>M</sex>
<Address>UnitedStates</Address>
<PersonalInformation>
<Country>UnitedStates</country>
<Street1>KO 92</Street1>
</PersonalInformation>
</Employee>
in the output, replaced text from Canada to UnitedStates.
so, without using xslt:transform() functions on any element , i should be able to replace text Canada to UnitedStates irrespective of level nodes.
Where ever i find 'Canada' i should be able to replace to 'UnitedStates' in entire xml.
So how can i achieve this.?
I. XSLT 1.0 solution:
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:my="my:my" >
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<my:Reps>
<rep>
<old>replace this</old>
<new>replaced</new>
</rep>
<rep>
<old>cat</old>
<new>tiger</new>
</rep>
</my:Reps>
<xsl:variable name="vReps" select=
"document('')/*/my:Reps/*"/>
<xsl:template match="node()|#*" name="identity">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="#*">
<xsl:attribute name="{name()}">
<xsl:call-template name="replace">
<xsl:with-param name="pText" select="."/>
</xsl:call-template>
</xsl:attribute>
</xsl:template>
<xsl:template match="text()" name="replace">
<xsl:param name="pText" select="."/>
<xsl:if test="string-length($pText)">
<xsl:choose>
<xsl:when test=
"not($vReps/old[contains($pText, .)])">
<xsl:copy-of select="$pText"/>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="vthisRep" select=
"$vReps/old[contains($pText, .)][1]
"/>
<xsl:variable name="vNewText">
<xsl:value-of
select="substring-before($pText, $vthisRep)"/>
<xsl:value-of select="$vthisRep/../new"/>
<xsl:value-of select=
"substring-after($pText, $vthisRep)"/>
</xsl:variable>
<xsl:call-template name="replace">
<xsl:with-param name="pText"
select="$vNewText"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
when applied on this XML document:
<t>
<a attr1="X replace this Y">
<b>cat mouse replace this cat dog</b>
</a>
<c/>
</t>
produces the wanted, correct result:
<t>
<a attr1="X replaced Y">
<b>tiger mouse replaced tiger dog</b>
</a>
<c/>
</t>
Explanation:
The identity rule is used to copy "as-is" some nodes.
We perform multiple replacements, parameterized in my:Reps
If a text node or an attribute doesn't contain any rep-target, it is copied as-is.
If a text node or an attribute contains text to be replaced (rep target), then the replacements are done in the order specified in my:Reps
If the string contains more than one string target, then all targets are replaced: first all occurences of the first rep target, then all occurences of the second rep target, ..., last all occurences of the last rep target.
II. XSLT 2.0 solution:
In XSLT 2.0 one can simply use the standard XPath 2.0 function replace(). However, for multiple replacements the solution would be still very similar to the XSLT 1.0 solution specified above.

XSLT: need to replace document('')

I've the following xslt file:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- USDomesticCountryList - USE UPPERCASE LETTERS ONLY -->
<xsl:variable name="USDomesticCountryList">
<entry name="US"/>
<entry name="UK"/>
<entry name="EG"/>
</xsl:variable>
<!--// USDomesticCountryList -->
<xsl:template name="IsUSDomesticCountry">
<xsl:param name="countryParam"/>
<xsl:variable name="country" select="normalize-space($countryParam)"/>
<xsl:value-of select="normalize-space(document('')//xsl:variable[#name='USDomesticCountryList']/entry[#name=$country]/#name)"/>
</xsl:template>
</xsl:stylesheet>
I need to replace the "document('')" xpath function, what should I use instead?
I've tried to remove it completely but the xsl document doesn't work for me!
I need to to so because the problem is :
I am using some XSLT document that uses the above file, say document a.
So I have document a that includes the above file (document b).
I am using doc a from java code, I am do Caching for doc a as a javax.xml.transform.Templates object to prevent multiple reads to the xsl file on every transformation request.
I found that, the doc b is re-calling itself from the harddisk, I believe this is because of the document('') function above, so I wanna replace/remove it.
Thanks.
If you want to access the nodes inside a variable you normally use the node-set() extension function. The availability and syntax depends on the processor you use. For MSXML and Saxon you can use exsl:node-set(). To use the extension function you will have to include the namespace that defines the function.
E.g. (tested wiht MSXML, returns US for countryName = 'US'):
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
extension-element-prefixes="exsl"
>
<xsl:output method="xml"/>
<!-- USDomesticCountryList - USE UPPERCASE LETTERS ONLY -->
<xsl:variable name="USDomesticCountryList">
<entry name="US"/>
<entry name="UK"/>
<entry name="EG"/>
</xsl:variable>
<!--// USDomesticCountryList -->
<xsl:template name="IsUSDomesticCountry">
<xsl:param name="countryParam"/>
<xsl:variable name="country" select="normalize-space($countryParam)"/>
<xsl:value-of select="exsl:node-set($USDomesticCountryList)/entry[#name=$country]/#name"/>
</xsl:template>
</xsl:stylesheet>
If you're trying to make the IsUSDomesticCountry template work without using document(''), you could rewrite the template to
<xsl:template name="IsUSDomesticCountry">
<xsl:param name="countryParam"/>
<xsl:variable name="country" select="normalize-space($countryParam)"/>
<xsl:choose>
<xsl:when test="$country='US'">true</xsl:when>
<xsl:when test="$country='UK'">true</xsl:when>
<xsl:when test="$country='EG'">true</xsl:when>
<xsl:otherwise>false</xsl:otherwise>
</xsl:choose>
</xsl:template>
or
<xsl:template name="IsUSDomesticCountry">
<xsl:param name="countryParam"/>
<xsl:variable name="country" select="normalize-space($countryParam)"/>
<xsl:value-of select="$country='US' or $country='UK' or $country='EG'"/>
</xsl:template>
or even
<xsl:template name="IsUSDomesticCountry">
<xsl:param name="countryParam"/>
<xsl:variable name="country"
select="concat('-', normalize-space($countryParam),'-')"/>
<xsl:variable name="domesticCountries" select="'-US-UK-EG-'"/>
<xsl:value-of select="contains($domesticCountries, $country)"/>
</xsl:template>
Personally, I find the variant using document('') to be more readable.