How to find all numbers in a string - xslt

Ich versuche mit einer Funktion sämtliche Zahlen aus einem Element oder String zu ermitteln. Dabei soll die Anzahl der Zahlen und ihre Stelligkeit egal sein.
Folgende Funktion habe ich bislang geschrieben:
<xsl:function name="itp:find_num">
<xsl:param name="tmp"/>
<xsl:if test="matches($tmp,'\d+')">
<xsl:analyze-string select="$tmp" regex="{'\d+'}">
<xsl:matching-substring>
<xsl:sequence select="."/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="''"/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:if>
</xsl:function>
Beispiel XML:
<address>street 12, 12345 town<address>
Bei dem Funktionsaufruf soll dann die entsprechende Zahl ausgewählt werden können:
...select="itp:find_num(address)[2]"/>
Zum Beispiel die 2 für die Postleitzahl.
Das Problem ist nun, dass in der Sequence auch leere Werte stehen, so dass ich in der Praxis die Postleitzahl nur mit [4] erreiche.
Gibt es eine elegantere Möglichkeit meine Problem zu lösen?
Und wenn nicht, wie lösche ich die leeren Elemente aus der Sequence??
Now in Englisch :-)
I'm trying to find all numbers in an element oder string. It shouldn't matter how many numbers are available or at which position they are in the string.
Here is my function:
<xsl:function name="itp:find_num">
<xsl:param name="tmp"/>
<xsl:if test="matches($tmp,'\d+')">
<xsl:analyze-string select="$tmp" regex="{'\d+'}">
<xsl:matching-substring>
<xsl:if test=". != ''">
<xsl:sequence select="."/>
</xsl:if>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="''"/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:if>
</xsl:function>
Example XML
<address>street 12, 12345 town<address>
When I call the function I want to choose, which number I want to pick:
...select="itp:find_num(address)[2]"/>
Par example [2] for the postal code.
The Problem I have is, that there are empty elements in the sequence, so that I have to choose [4] to get the postal code.
Is there a easier way to solve my problem?
Or is there a way to remove all empty elements in that sequence??
Thanks :-)

There is no need for complex xsl:analyze-string processing at all.
This XPath one-liner:
for $i in tokenize($pStr, '[^0-9]+')[.]
return xs:integer($i)
produces the wanted sequence of the integers in the string:
12 12345
Here is a complete transformation:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:my="my:my">
<xsl:output method="text"/>
<xsl:template match="/*">
<xsl:sequence select="my:nums(.)"/>
</xsl:template>
<xsl:function name="my:nums" as="xs:integer*">
<xsl:param name="pStr" as="xs:string"/>
<xsl:sequence select=
"for $i in tokenize($pStr, '[^0-9]+')[.]
return xs:integer($i)"/>
</xsl:function>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<address>street 12, 12345 town</address>
the wanted, correct result is produced:
12 12345
Even simpler, XPath 3.0 one-liner:
tokenize($pStr, '[^0-9]+')[.] ! xs:integer(.)

I would write the function as
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:mf="http://example.org/mf"
exclude-result-prefixes="xs mf">
<xsl:function name="mf:find_num" as="xs:integer*">
<xsl:param name="input" as="xs:string"/>
<xsl:analyze-string select="$input" regex="[0-9]+">
<xsl:matching-substring>
<xsl:sequence select="xs:integer(.)"/>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:function>
<xsl:template match="address">
<xsl:value-of select="mf:find_num(.)" separator=", "/>
</xsl:template>
</xsl:stylesheet>
Of course converting to xs:integer is optional, if you want the function to return a sequence of strings containing digits you would simply change it to do
<xsl:function name="mf:find_num" as="xs:string*">
<xsl:param name="input" as="xs:string"/>
<xsl:analyze-string select="$input" regex="[0-9]+">
<xsl:matching-substring>
<xsl:sequence select="."/>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:function>

Related

xslt2+ How to combine groups with any matching elements and remove duplicates of elements

This is my solution to combine groups with any matching elements and remove duplicates of elements.
For example, I sketched a simple input and what the output should be. If two groups have the same elements, then the groups are combined into one with all elements except repetitions.
Are there alternative approaches ?
<xsl:variable name="in">
<g>
<i>8</i>
<i>2</i>
</g>
<g>
<i>2</i>
<i>4</i>
</g>
<g>
<i>4</i>
<i>5</i>
</g>
<g>
<i>6</i>
<i>7</i>
</g>
</xsl:variable>
<xsl:template match="/">
<out>
<xsl:for-each-group select="$in/g/i" group-by="k2:iin(.,$in)[1]">
<g>
<xsl:for-each-group select="current-group()" group-by=".">
<xsl:copy-of select="current-group()[1]"/>
</xsl:for-each-group>
</g>
</xsl:for-each-group>
</out>
</xsl:template>
<xsl:function name="k2:iin">
<xsl:param name="i"/> <!-- current catch -->
<xsl:param name="in"/> <!-- const catch scope -->
<xsl:sequence select="
let $xi:=$in/g[i = $i]/i return
if($xi[not(. = $i)])then
k2:iin($xi,$in) else
$xi
"/>
</xsl:function>
<out>
<g>
<i>8</i>
<i>2</i>
<i>4</i>
<i>5</i>
</g>
<g>
<i>6</i>
<i>7</i>
</g>
</out>
As well as the suggestions made in comments, you could replace the inner xsl:for-each-group by
<xsl:for-each select="distinct-values(current-group())">
<i><xsl:value-of select="."/></i>
</xsl:for-each>
Though distinct-values doesn't guarantee to retain order, whereas xsl:for-each-group does. So there's no real benefit over your approach (but you did ask for alternatives...)
As the question says xslt2+ I thought about a compact or elegant XSLT approach, it seems you don't really need to use grouping but could just store sequence of arrays of integers. Nevertheless, somehow that attempt to write a recursive processing approach using fold-left has not really given a compact or elegant approach, I post it just to show the attempt:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:map="http://www.w3.org/2005/xpath-functions/map"
exclude-result-prefixes="#all"
expand-text="yes">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="data">
<xsl:copy>
<xsl:variable name="i-value-groups"
select="fold-left(g, (), function($value, $g) {
let $i-values := distinct-values($g/i/xs:integer(.)),
$group := $value[?* = $i-values],
$group-pos := for $pos in 1 to count($value) return $pos[exists($value[?* = $i-values])]
return
if (exists($group))
then (subsequence($value, 1, $group-pos - 1), array { distinct-values(($group?*, $i-values)) }, subsequence($value, $group-pos + 1))
else ($value, array { $i-values })
}
)"/>
<xsl:for-each select="$i-value-groups">
<g>
<xsl:for-each select="?*">
<i>{.}</i>
</xsl:for-each>
</g>
</xsl:for-each>
</xsl:copy>
</xsl:template>
<xsl:mode on-no-match="shallow-copy"/>
</xsl:stylesheet>
This assumes an input sample like
<data>
<g>
<i>8</i>
<i>2</i>
</g>
<g>
<i>2</i>
<i>4</i>
</g>
<g>
<i>4</i>
<i>5</i>
</g>
<g>
<i>6</i>
<i>7</i>
</g>
</data>
Of course, if instead of the plain integers, the g elements would need to be "grouped", you could use the same fold-left approach returning a sequence of arrays, or, in the following sample, a sequence of maps (in XSLT/XPath 4 parcels or records):
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:map="http://www.w3.org/2005/xpath-functions/map"
xmlns:mf="http://example.com/mf"
exclude-result-prefixes="#all"
expand-text="yes">
<xsl:output method="xml" indent="yes"/>
<xsl:function name="mf:collect-parcels" as="map(xs:string, element(g)*)*">
<xsl:param name="input-sequence" as="element(g)*"/>
<xsl:sequence
select="fold-left(
$input-sequence,
(),
function($parcel-ac, $g) {
let $i-elements := $g/i,
$matching-parcel-pos :=
for $pos in 1 to count($parcel-ac)
return $pos[exists($parcel-ac[$pos][?value/i = $i-elements])],
$matching-parcel := $parcel-ac[$matching-parcel-pos]
return
if (exists($matching-parcel))
then (subsequence($parcel-ac, 1, $matching-parcel-pos - 1), map:entry('value', ($matching-parcel?value, $g)), subsequence($parcel-ac, $matching-parcel-pos + 1))
else ($parcel-ac, map:entry('value', $g))
}
)"/>
</xsl:function>
<xsl:template match="data">
<xsl:copy>
<xsl:for-each select="mf:collect-parcels(g)">
<g>
<xsl:for-each-group select="?value/i" group-by=".">
<xsl:copy-of select="."/>
</xsl:for-each-group>
</g>
</xsl:for-each>
</xsl:copy>
</xsl:template>
<xsl:mode on-no-match="shallow-copy"/>
</xsl:stylesheet>
I realized that it is possible to rise to the level of Groups (sequences of Items are unnecessary)
<xsl:template match="/">
<out>
<xsl:for-each-group
select="$in/g"
group-by="k2:g(.)[1]"
>
<g>
<xsl:for-each-group
select="current-group()/i"
group-by="."
>
<xsl:copy-of select="."/>
</xsl:for-each-group>
</g>
</xsl:for-each-group>
</out>
</xsl:template>
<xsl:function name="k2:g">
<xsl:param name="g"/>
<xsl:sequence select="
let $xg:=$g[1]/../g[i[. = $g/i]] return
if(count($xg) gt count($g))then k2:g($xg) else
$xg
"/>
</xsl:function>
To try to avoid an overhead in group-by, you can first collect all the Groups that disappear. (Of course, to pay for this complication of control over the solution)
<xsl:template match="/">
<xsl:variable name="gmap" select="k2:gmap(1,$in/g[1],map{})"/>
<xsl:message select="$gmap"/>
<out>
<xsl:comment select="'+gmap()'"/>
<xsl:for-each-group
select="$in/g"
group-by="
let $x:=(1 + count(preceding-sibling::g)) return
($gmap($x),$x)[1]
"
>
<g>
<xsl:for-each-group
select="current-group()/i"
group-by="."
>
<xsl:copy-of select="."/>
</xsl:for-each-group>
</g>
</xsl:for-each-group>
</out>
</xsl:template>
<xsl:function name="k2:g">
<xsl:param name="g"/>
<xsl:sequence select="
let $xg:=$g[1]/../g[i[. = $g/i]] return
if(count($xg) gt count($g))then k2:g($xg) else
$xg
"/>
</xsl:function>
<xsl:function name="k2:gmap">
<xsl:param name="gpos"/>
<xsl:param name="g"/>
<xsl:param name="gmap"/>
<xsl:sequence select="
if(empty($g))then $gmap else
let $xg:=if($gmap($gpos))then () else k2:g($g) return
k2:gmap(
$gpos + 1
,$g/following-sibling::g[1]
,if(empty($xg[2]))then $gmap else
map:merge(
($gmap
,for $x in subsequence($xg,2) return
map:entry(1 + count($x/preceding-sibling::g),$gpos)
)
)
)
"/>
</xsl:function>

How to remove duplicate entry - XSLT

I am try to remove duplicate entry after entity § and if contains the , in entry and after tokenize the start-with the ( round bracket then entry e.g (17200(b)(2), (4)–(6)) s/b e.g (<p>17200(b)(2)</p><p>17200(b)(4)–(6)</p>).
Input XML
<root>
<p>CC §1(a), (b), (c)</p>
<p>Civil Code §1(a), (b)</p>
<p>CC §§2(a)</p>
<p>Civil Code §3(a)</p>
<p>CC §1(c)</p>
<p>Civil Code §1(a), (b), (c)</p>
<p>Civil Code §17200(b)(2), (4)–(6), (8), (12), (16), (20), and (21)</p>
</root>
Expected Output
<root>
<sec specific-use="CC">
<title content-type="Sta_Head3">CIVIL CODE</title>
<p>1(a)</p>
<p>1(b)</p>
<p>1(c)</p>
<p>2(a)</p>
<p>3(a)</p>
<p>17200(b)(2)</p>
<p>17200(b)(4)–(6)</p>
<p>17200(b)(8)</p>
<p>17200(b)(12)</p>
<p>17200(b)(16)</p>
<p>17200(b)(20)</p>
<p>17200(b)(21)</p>
</sec>
</root>
XSLT Code
<xsl:template match="root">
<xsl:copy>
<xsl:for-each-group select="p[(starts-with(., 'CC ') or starts-with(., 'Civil Code'))]" group-by="replace(substring-before(., ' §'), 'Civil Code', 'CC')">
<xsl:text>
</xsl:text>
<sec specific-use="{current-grouping-key()}">
<xsl:text>
</xsl:text>
<title content-type="Sta_Head3">CIVIL CODE</title>
<xsl:for-each-group select="current-group()" group-by="replace(substring-after(., '§'), '§', '')">
<xsl:sort select="replace(current-grouping-key(), '[^0-9.].*$', '')" data-type="number" order="ascending"/>
<xsl:for-each
select="distinct-values(
current-grouping-key() !
(let $tokens := tokenize(current-grouping-key(), ', and |, | and ')
return (head($tokens), tail($tokens) ! (substring-before(head($tokens), '(') || .)))
)" expand-text="yes">
<p>{.}</p>
</xsl:for-each>
</xsl:for-each-group>
</sec>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
You could do it like this, in a two-step approach where you first compute the list of existing elements and then use a for-each-group to remove duplicates.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
exclude-result-prefixes="#all"
version="3.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="listP">
<xsl:apply-templates select="root/p"/>
</xsl:variable>
<xsl:for-each-group select="$listP" group-by="p">
<p><xsl:value-of select="current-grouping-key()"/></p>
</xsl:for-each-group>
</xsl:template>
<xsl:template match="p">
<xsl:variable name="input" select="replace(substring-after(.,'§'),'§','')"/>
<xsl:variable name="chapter" select="substring-before($input,'(')"/>
<xsl:for-each select="tokenize(substring-after($input, $chapter),',')">
<p><xsl:value-of select="concat($chapter,replace(replace(.,' ',''),'and',''))"/></p>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
See it working here : https://xsltfiddle.liberty-development.net/gVrvcxQ

How to use case replacement pattern with Xpath replace function

I have this regexp and substitution patterns demo and need to use it within an xpath context with the fn:replace function,but I can't figure out how to write the replacement string correctly Is it possible ?
my naive test was
replace ("dsfjkljsdfjlsjdfABCDdfsfsdff",
"(\p{Lu})(\p{Lu}+)",
"$1\L$2")
but it complains with FORX0004 : Invalid replacement string in replace() : \ character must be followed by \ or $
I think you want e.g.
<xsl:function name="mf:lower-case-match">
<xsl:param name="input" as="xs:string"/>
<xsl:param name="regex" as="xs:string"/>
<xsl:analyze-string select="$input" regex="{$regex}">
<xsl:matching-substring>
<xsl:value-of select="concat(regex-group(1), lower-case(regex-group(2)))"/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:function>
mf:lower-case-match("dsfjkljsdfjlsjdfABCDdfsfsdff", "(\p{Lu})(\p{Lu}+)")
or, to use the as="xs:string" as the declared function type:
<xsl:function name="mf:lower-case-match" as="xs:string">
<xsl:param name="input" as="xs:string"/>
<xsl:param name="regex" as="xs:string"/>
<xsl:value-of>
<xsl:analyze-string select="$input" regex="{$regex}">
<xsl:matching-substring>
<xsl:value-of select="concat(regex-group(1), lower-case(regex-group(2)))"/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:value-of>
</xsl:function>
You need to declare a namespace for any user-defined function e.g. xmlns:mf="http://example.com/mf" on the xsl:stylesheet or xsl:transform root.
In XSLT 3 you could also simply push the result of the analyze-string function through a mode that then performs any transformation on the groups you want:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="text">
<xsl:copy>
<xsl:apply-templates select="analyze-string(., '(\p{Lu})(\p{Lu}+)')" mode="lower-case"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*:group[#nr = 2]" mode="lower-case">
<xsl:value-of select="lower-case(.)"/>
</xsl:template>
</xsl:stylesheet>
I don't think \L regex property is supported with XPath. #Martin Honnen's answer is probably the best, but here's a full XPath 2.0 solution :
With :
dsfjkljsdfjlsjdfABCDdfsfsdff
XPath :
replace(replace("dsfjkljsdfjlsjdfABCDdfsfsdff","(\p{Lu})(\p{Lu}+)","$1___$2___"),"_{3}.+_{3}",lower-case(substring-before(substring-after(replace("dsfjkljsdfjlsjdfABCDdfsfsdff","(\p{Lu})(\p{Lu}+)","$1___$2___"),"___"),"___")))
Description :
P1 : We add ___ to identify the lower-case part with :
replace("dsfjkljsdfjlsjdfABCDdfsfsdff","(\p{Lu})(\p{Lu}+)","$1___$2___")
P2 : We generate the lower case part with :
lower-case(substring-before(substring-after(resultofP1,"___"),"___"))
We join the two preceding expressions with :
replace(resultofP1,"_{3}.+_{3}",resultofP2)
Output :
dsfjkljsdfjlsjdfAbcddfsfsdff

Replce utf to unicode using xslt with help of database.xml

I want to replace all utf to unicode as below given example using xslt.
Here many utf entities are in my xml and just want to replace then to unicode using xslt with help of a database file which contains all utf and unicode values as well.
Please refer to below given example.
database.xml:-
<entities>
<entity utf8="°" unicode="x00B0" iso="deg" latin1="176"/>
<entity utf8="í" unicode="x00ED" iso="iacute" latin1="237"/>
<entity utf8="é" unicode="x00E9" iso="eacute" latin1="233"/>
<entity utf8="ó" unicode="x00F3" iso="oacute" latin1="243"/>
<entity utf8="â¢" unicode="x2062" iso="InvisibleTimes" latin1="Not Available"/>
</entities>
input:-
<article>
<documentinfo>
<title lang="eng">First report on the contribution of small-sized species to the copepod community structure of the southern Patagonian shelf (Argentina, 47-55°S)</title>
<author>
<lastname>Julieta</lastname>
<firstname>Carolina</firstname>
<middlename>Antacli</middlename>
<fullname>Carolina Antacli Julieta</fullname>
<corresponding>yes</corresponding>
<email>James#gmail.com</email>
<affiliation>Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET). Av. Rivadavia 1917, C1033AAJ, Buenos Aires, Argentina,</affiliation>
<affiliation>Instituto Nacional de Investigación y Desarrollo Pesquero (INIDEP). Paseo Victoria Ocampo 1, B7602HSA, Mar del Plata, Argentina</affiliation>
<affiliation>Instituto de Investigaciones Marinas y Costeras (IIMYC), CONICET-Universidad Nacional de Mar del Plata, Argentina</affiliation>
</author>
Output:-
<article>
<documentinfo>
<title lang="eng">First report on the contribution of small-sized species to the copepod community structure of the southern Patagonian shelf (Argentina, 47-55°S)</title>
<author>
<lastname>Julieta</lastname>
<firstname>Carolina</firstname>
<middlename>Antacli</middlename>
<fullname>Carolina Antacli Julieta</fullname>
<corresponding>yes</corresponding>
<email>James#gmail.com</email>
<affiliation>Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET). Av. Rivadavia 1917, C1033AAJ, Buenos Aires, Argentina,</affiliation>
<affiliation>Instituto Nacional de Investigación y Desarrollo Pesquero (INIDEP). Paseo Victoria Ocampo 1, B7602HSA, Mar del Plata, Argentina</affiliation>
<affiliation>Instituto de Investigaciones Marinas y Costeras (IIMYC), CONICET-Universidad Nacional de Mar del Plata, Argentina</affiliation>
</author>
</documentinfo>
</article>
You can download all files from:- http://www.stylusstudio.com/SSDN/upload/Entities-Replacement.zip also.
If you want to change the encoding of a file, you better use a tool like iconv. E.g.
iconv -f UTF-8 -t UCS-2LE input_UTF8.xml > output_UCS.xml
This info on internationalization and encoding might be useful as well; http://docstore.mik.ua/orelly/xml/jxslt/ch08_06.htm
That said, character-map does work with the input.xml you've uploaded;
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:character-map name="specialsigns">
<xsl:output-character character="°" string="x00B0"/>
<xsl:output-character character="í" string="x00ED"/>
<xsl:output-character character="é" string="x00E9"/>
<xsl:output-character character="ó" string="x00F3"/>
</xsl:character-map>
<xsl:output indent="yes" method="xml" use-character-maps="specialsigns"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
This is just a modified stylesheet from replace a string with a string with xslt.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vReps" select="document('entities.xml')/entities/*"/>
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="*|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()" name="replace">
<xsl:param name="pText" select="."/>
<xsl:choose>
<xsl:when test="not($vReps/#utf8[contains($pText, .)])">
<xsl:value-of select="$pText"/>
</xsl:when>
<xsl:otherwise>
<xsl:call-template name="multiReplace">
<xsl:with-param name="pText" select="$pText"/>
<xsl:with-param name="pReps"
select="$vReps[contains($pText, #utf8)]"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template name="multiReplace">
<xsl:param name="pText"/>
<xsl:param name="pReps"/>
<xsl:choose>
<xsl:when test="$pReps">
<xsl:variable name="escaped">
<xsl:value-of select="concat('&#', $pReps[1]/#unicode, ';')" disable-output-escaping="yes"/>
</xsl:variable>
<xsl:variable name="vRepResult">
<xsl:call-template name="singleReplace">
<xsl:with-param name="pText" select="$pText"/>
<xsl:with-param name="pOld" select="$pReps[1]/#utf8"/>
<xsl:with-param name="pNew" select="$escaped"/>
</xsl:call-template>
</xsl:variable>
<xsl:call-template name="multiReplace">
<xsl:with-param name="pText" select="$vRepResult"/>
<xsl:with-param name="pReps" select="$pReps[position() >1]"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$pText"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template name="singleReplace">
<xsl:param name="pText"/>
<xsl:param name="pOld"/>
<xsl:param name="pNew"/>
<xsl:if test="$pText">
<xsl:choose>
<xsl:when test="not(contains($pText, $pOld))">
<xsl:value-of select="$pText"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="substring-before($pText, $pOld)"/>
<xsl:value-of select="$pNew" disable-output-escaping="yes"/>
<xsl:call-template name="singleReplace">
<xsl:with-param name="pText" select="substring-after($pText, $pOld)"/>
<xsl:with-param name="pOld" select="$pOld"/>
<xsl:with-param name="pNew" select="$pNew"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
when applied to your inputs, I get:
<article>
<documentinfo>
<title lang="eng">First report on the contribution of small-sized species to the copepod community structure of the southern Patagonian shelf (Argentina, 47-55&#x00B0;S)</title>
<author>
<lastname>Julieta</lastname>
<firstname>Carolina</firstname>
<middlename>Antacli</middlename>
<fullname>Carolina Antacli Julieta</fullname>
<corresponding>yes</corresponding>
<email>James#gmail.com</email>
<affiliation>Consejo Nacional de Investigaciones Cient&#x00ED;ficas y T&#x00E9;cnicas (CONICET). Av. Rivadavia 1917, C1033AAJ, Buenos Aires, Argentina,</affiliation>
<affiliation>Instituto Nacional de Investigaci&#x00F3;n y Desarrollo Pesquero (INIDEP). Paseo Victoria Ocampo 1, B7602HSA, Mar del Plata, Argentina</affiliation>
<affiliation>Instituto de Investigaciones Marinas y Costeras (IIMYC), CONICET-Universidad Nacional de Mar del Plata, Argentina</affiliation>
</author>
</documentinfo>
</article>
All credit goes to Dimitre Novatchev.

How to do this in XSLT without incrementing variables? (Tweaking Xalan to create a global XSLT iterator. Do I have other options?)

I'm trying to think functional, in XSLT terms, as much as possible, but in this case, I really don't see how to do it without tweaking. I have roughly this data structure:
<transactions>
<trx>
<text>abc</text>
<text>def</text>
<detail>
<text>xxx</text>
<text>yyy</text>
<text>zzz</text>
</detail>
</trx>
</transactions>
Which I roughly want to flatten into this form
<row>abc</row>
<row>def</row>
<row>xxx</row>
<row>yyy</row>
<row>zzz</row>
But the tricky thing is: I want to create chunks of 40 text-rows and transactions mustn't be split across chunks. I.e. if my current chunk already has 38 rows, the above transaction would have to go into the next chunk. The current chunk would need to be filled with two empty rows to complete the 40:
<row/>
<row/>
In imperative/procedural programming, it's very easy. Just create a global iterator variable counting to multiples of 40, and insert empty rows if needed (I have provided an answer showing how to tweak XSLT/Xalan to allow for such variables). But how to do it with XSLT? N.B: I'm afraid recursion is not possible considering the size of data I'm processing... But maybe I'm wrong on that
I. Here is an XSLT 1.0 solution (the XSLT 2.0 solution is much easier):
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ext="http://exslt.org/common" exclude-result-prefixes="ext">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:param name="pChunkSize" select="8"/>
<xsl:param name="vChunkSize" select="$pChunkSize+1"/>
<xsl:variable name="vSheet" select="document('')"/>
<xsl:variable name="vrtfEmptyChunk">
<xsl:for-each select=
"($vSheet//node())[not(position() > $pChunkSize)]">
<row/>
</xsl:for-each>
</xsl:variable>
<xsl:variable name="vEmptyChunk" select=
"ext:node-set($vrtfEmptyChunk)/*"/>
<xsl:variable name="vrtfDummy">
<delete/>
</xsl:variable>
<xsl:variable name="vDummy" select="ext:node-set($vrtfDummy)/*"/>
<xsl:template match="/*">
<chunks>
<xsl:call-template name="fillChunks">
<xsl:with-param name="pNodes" select="trx"/>
<xsl:with-param name="pCurChunk" select="$vDummy"/>
</xsl:call-template>
</chunks>
</xsl:template>
<xsl:template name="fillChunks">
<xsl:param name="pNodes"/>
<xsl:param name="pCurChunk"/>
<xsl:choose>
<xsl:when test="not($pNodes)">
<chunk>
<xsl:apply-templates mode="rename" select="$pCurChunk[self::text]"/>
<xsl:copy-of select=
"$vEmptyChunk[not(position() > $vChunkSize - count($pCurChunk))]"/>
</chunk>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="vAvailable" select=
"$vChunkSize - count($pCurChunk)"/>
<xsl:variable name="vcurNode" select="$pNodes[1]"/>
<xsl:variable name="vTrans" select="$vcurNode//text"/>
<xsl:variable name="vNumNewNodes" select="count($vTrans)"/>
<xsl:choose>
<xsl:when test="not($vNumNewNodes > $vAvailable)">
<xsl:variable name="vNewChunk"
select="$pCurChunk | $vTrans"/>
<xsl:call-template name="fillChunks">
<xsl:with-param name="pNodes" select="$pNodes[position() > 1]"/>
<xsl:with-param name="pCurChunk" select="$vNewChunk"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<chunk>
<xsl:apply-templates mode="rename" select="$pCurChunk[self::text]"/>
<xsl:copy-of select=
"$vEmptyChunk[not(position() > $vAvailable)]"/>
</chunk>
<xsl:call-template name="fillChunks">
<xsl:with-param name="pNodes" select="$pNodes"/>
<xsl:with-param name="pCurChunk" select="$vDummy"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="text" mode="rename">
<row>
<xsl:value-of select="."/>
</row>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the following XML document (based on the provided one, but with three trxelements):
<transactions>
<trx>
<text>abc</text>
<text>def</text>
<detail>
<text>xxx</text>
<text>yyy</text>
<text>zzz</text>
</detail>
</trx>
<trx>
<text>abc2</text>
<text>def2</text>
</trx>
<trx>
<text>abc3</text>
<text>def3</text>
<detail>
<text>xxx3</text>
<text>yyy3</text>
<text>zzz3</text>
</detail>
</trx>
</transactions>
the wanted, correct result (two chunks with size 8) is produced:
<chunks>
<chunk>
<row>abc</row>
<row>def</row>
<row>xxx</row>
<row>yyy</row>
<row>zzz</row>
<row>abc2</row>
<row>def2</row>
<row/>
</chunk>
<chunk>
<row>abc3</row>
<row>def3</row>
<row>xxx3</row>
<row>yyy3</row>
<row>zzz3</row>
<row/>
<row/>
<row/>
</chunk>
</chunks>
Do note:
The first two transactions' text elements total number is 7 and they fit in one 8-place chunk.
The third transaction has 5 text elements and doesn't fit in the remaining space of the first chunk -- it is put in a new chunk.
II. XSLT 2.0 Solution (using FXSL)
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:f="http://fxsl.sf.net/"
xmlns:dvc-foldl-func="dvc-foldl-func"
exclude-result-prefixes="f dvc-foldl-func"
>
<xsl:import href="../f/func-dvc-foldl.xsl"/>
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:param name="pChunkSize" select="8"/>
<dvc-foldl-func:dvc-foldl-func/>
<xsl:variable name="vPadding">
<row/>
</xsl:variable>
<xsl:variable name="vFoldlFun" select="document('')/*/dvc-foldl-func:*[1]"/>
<xsl:template match="/">
<xsl:variable name="vpaddingChunk" select=
"for $i in 1 to $pChunkSize
return ' '
"/>
<xsl:variable name="vfoldlResult" select=
"f:foldl($vFoldlFun, (), /*/trx),
$vpaddingChunk
"/>
<xsl:variable name="vresultCount"
select="count($vfoldlResult)"/>
<xsl:variable name="vFinalResult"
select="subsequence($vfoldlResult, 1,
$vresultCount - $vresultCount mod $pChunkSize
)"/>
<result>
<xsl:for-each select="$vFinalResult">
<row>
<xsl:value-of select="."/>
</row>
</xsl:for-each>
<xsl:text>
</xsl:text>
</result>
</xsl:template>
<xsl:template match="dvc-foldl-func:*" mode="f:FXSL">
<xsl:param name="arg1"/>
<xsl:param name="arg2"/>
<xsl:variable name="vCurCount" select="count($arg1)"/>
<xsl:variable name="vNewCount" select="count($arg2//text)"/>
<xsl:variable name="vAvailable" select=
"$pChunkSize - $vCurCount mod $pChunkSize"/>
<xsl:choose>
<xsl:when test="$vNewCount le $vAvailable">
<xsl:sequence select="$arg1, $arg2//text"/>
</xsl:when>
<xsl:otherwise>
<xsl:sequence select="$arg1"/>
<xsl:for-each select="1 to $vAvailable">
<xsl:sequence select="$vPadding/*"/>
</xsl:for-each>
<xsl:sequence select="$arg2//text"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the same XML document (above), the same correct, wanted result is produced:
<result>
<row>abc</row>
<row>def</row>
<row>xxx</row>
<row>yyy</row>
<row>zzz</row>
<row>abc2</row>
<row>def2</row>
<row/>
<row>abc3</row>
<row>def3</row>
<row>xxx3</row>
<row>yyy3</row>
<row>zzz3</row>
<row> </row>
<row> </row>
<row> </row>
</result>
Do note:
The use of the f:foldl() function.
A special DVC (Divide and Conquer) variant of f:foldl() so that recursion stack overflow is avoided for all practical purposes -- for example, the maximum recursion stack depth for 1000000 (1M) trx elements is just 19.
Build the complete XML data structure as you need in Java. Then, do the simple iteration in XSL over prepared XML.
You might save a lot of effort and provide a maintainable solution.
As promised a simplified example answer showing how Xalan can be tweaked to allow for incrementing such global iterators:
<xsl:stylesheet version="1.0" xmlns:f="xalan://com.example.Functions">
<!-- the global row counter variable -->
<xsl:variable name="row" select="0"/>
<xsl:template match="trx">
<!-- wherever needed, the $row variable can be globally incremented -->
<xsl:variable name="iteration" value="f:increment('row')"/>
<!-- based upon this variable, calculations can be made -->
<xsl:variable name="remaining-rows-in-chunk"
value="40 - (($iteration - 1) mod 40) "/>
<xsl:if test="count(.//text) > $remaining-rows-in-chunk">
<xsl:call-template name="empty-row">
<xsl:with-param name="rows" select="$remaining-rows-in-chunk"/>
</xsl:call-template>
</xsl:if>
<!-- process transaction now, that previous chunk has been filled [...] -->
</xsl:template>
<xsl:template name="empty-row">
<xsl:param name="rows"/>
<xsl:if test="$rows > 0">
<row/>
<xsl:variable name="dummy" select="f:increment('row')"/>
<xsl:call-template name="empty-row">
<xsl:with-param name="rows" select="$rows - 1"/>
</xsl:call-template>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
And the contents of com.example.Functions:
public class Functions {
public static String increment(ExpressionContext context, String nodeName) {
XNumber n = null;
try {
// Access the $row variable
n = ((XNumber) context.getVariableOrParam(new QName(nodeName)));
// Make it "mutable" using this tweak. I feel horrible about
// doing this, though ;-)
Field m_val = XNumber.class.getDeclaredField("m_val");
m_val.setAccessible(true);
// Increment it
m_val.setDouble(n, m_val.getDouble(n) + 1.0);
} catch (Exception e) {
log.error("Error", e);
}
return n == null ? null : n.str();
}
}